General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely affects their reliability by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across the PEs: some are affected more - hence less reliable - than others. This variation causes significant reductionin the lifetime of GP-GPU parts. In this article, we address the problem of "wear leveling" across processing units to mitigate lifetime uncertainty in GP-GPUs. We propose innovations in the static compiled code that can improve healing in PEs and stream cores (SCs) based on their degradation status. PE healing is a fine-grained very long instruction word (VLIW) slot assignment scheme that balances the stress of instructions across the PEs within an SC. SC healing is a coarse-grained workload allocation scheme that distributes workload across SCs in GP-GPUs. Both schemes share a common property: they adaptively shift workload from less reliable units to more reliable units, either spatially or temporally. These software schemes are based on online calibration with NBTI monitoring that equalizes the expected lifetime of PEs and SCs by regenerating adaptive compiled codes to respond to the specific health state of the GP-GPUs. We evaluate the effectiveness of the proposed schemes for various OpenCL kernels from the AMD APP SDK on Evergreen and Southern Island GPU architectures. The aging-aware healthy kernels generated by the PE (or SC) healing scheme reduce NBTI-induced voltage threshold shift by 30% (77% in the case of SCs), with no (moderate) performance penalty compared to the naive kernels.

Aging-aware compilation for GP-GPUs / Lotfi, Atieh; Rahimi, Abbas; Benini, Luca; Gupta, Rajesh K.. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - STAMPA. - 12:2(2015), pp. 24.1-24.20. [10.1145/2778984]

Aging-aware compilation for GP-GPUs

BENINI, LUCA;
2015

Abstract

General-purpose graphic processing units (GP-GPUs) offer high computational throughput using thousands of integrated processing elements (PEs). These PEs are stressed during workload execution, and negative bias temperature instability (NBTI) adversely affects their reliability by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across the PEs: some are affected more - hence less reliable - than others. This variation causes significant reductionin the lifetime of GP-GPU parts. In this article, we address the problem of "wear leveling" across processing units to mitigate lifetime uncertainty in GP-GPUs. We propose innovations in the static compiled code that can improve healing in PEs and stream cores (SCs) based on their degradation status. PE healing is a fine-grained very long instruction word (VLIW) slot assignment scheme that balances the stress of instructions across the PEs within an SC. SC healing is a coarse-grained workload allocation scheme that distributes workload across SCs in GP-GPUs. Both schemes share a common property: they adaptively shift workload from less reliable units to more reliable units, either spatially or temporally. These software schemes are based on online calibration with NBTI monitoring that equalizes the expected lifetime of PEs and SCs by regenerating adaptive compiled codes to respond to the specific health state of the GP-GPUs. We evaluate the effectiveness of the proposed schemes for various OpenCL kernels from the AMD APP SDK on Evergreen and Southern Island GPU architectures. The aging-aware healthy kernels generated by the PE (or SC) healing scheme reduce NBTI-induced voltage threshold shift by 30% (77% in the case of SCs), with no (moderate) performance penalty compared to the naive kernels.
2015
Aging-aware compilation for GP-GPUs / Lotfi, Atieh; Rahimi, Abbas; Benini, Luca; Gupta, Rajesh K.. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - STAMPA. - 12:2(2015), pp. 24.1-24.20. [10.1145/2778984]
Lotfi, Atieh; Rahimi, Abbas; Benini, Luca; Gupta, Rajesh K.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/545016
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 8
social impact