We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.
Abbas Rahimi, Andrea Marongiu, Rajesh K. Gupta, Luca Benini (2013). A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters. IEEE Computer Society [10.1109/CODES-ISSS.2013.6659022].
A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters
MARONGIU, ANDREA;BENINI, LUCA
2013
Abstract
We propose a tightly-coupled, multi-core cluster architecture with shared, variation-tolerant, and accuracy-reconfigurable floating-point units (FPUs). The resilient shared-FPUs dynamically characterize FP pipeline vulnerability (FPV) and expose it as metadata to a software scheduler for reducing the cost of error correction. To further reduce this cost, our programming and runtime environment also supports controlled approximate computation through a combination of design-time and runtime techniques. We provide OpenMP extensions (as custom directives) for FP computations to specify parts of a program that can be executed approximately. We use a profiling technique to identify tolerable error significance and error rate thresholds in error-tolerant image processing applications. This information guides an application-driven hardware FPU synthesis and optimization design flow to generate efficient FPUs. At runtime, the scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. We demonstrate the effectiveness of our approach (in terms of energy savings) on a 16-core tightly-coupled cluster with eight shared-FPUs for both error-tolerant and general-purpose error-intolerant applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.