Thousands of deep and wide pipelines working concurrently make GPGPU high power consuming parts. Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues. This paper proposes a method to increase spatiotemporal reuse of computational effort by a combination of compilation and micro-architectural design. An associative memristive memory (AMM) module is integrated with the floating point units (FPUs). Together, we enable fine-grained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback. For every kernel execution, the compiler pre-stores these high-frequent sets of values in AMM modules -- representing partial functionality of the associated FPU-- that are concurrently evaluated over two clock cycles. Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes. Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving.
Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing / Rahimi, Abbas; Ghofrani, Amirali; Angel, Miguel; Cheng, Kwang-Ting; Benini, Luca; Gupta, Rajesh K.. - STAMPA. - (2014), pp. 2593132.1-2593132.6. (Intervento presentato al convegno 51st Annual Design Automation Conference, DAC 2014 tenutosi a San Francisco, CA, usa nel 2014) [10.1145/2593069.2593132].
Energy-efficient GPGPU architectures via collaborative compilation and memristive memory-based computing
BENINI, LUCA;
2014
Abstract
Thousands of deep and wide pipelines working concurrently make GPGPU high power consuming parts. Energy-efficiency techniques employ voltage overscaling that increases timing sensitivity to variations and hence aggravating the energy use issues. This paper proposes a method to increase spatiotemporal reuse of computational effort by a combination of compilation and micro-architectural design. An associative memristive memory (AMM) module is integrated with the floating point units (FPUs). Together, we enable fine-grained partitioning of values and find high-frequency sets of values for the FPUs by searching the space of possible inputs, with the help of application-specific profile feedback. For every kernel execution, the compiler pre-stores these high-frequent sets of values in AMM modules -- representing partial functionality of the associated FPU-- that are concurrently evaluated over two clock cycles. Our simulation results show high hit rates with 32-entry AMM modules that enable 36% reduction in average energy use by the kernel codes. Compared to voltage overscaling, this technique enhances robustness against timing errors with 39% average energy saving.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.