Reliability is a major concern for nanoscale CMOS circuits. Degradation phenomena such as Electromigration, Negative Bias Temperature Instability, Time Dependent Dielectric Breakdown worsen with transistor scaling. Dynamic Reliability Management (DRM) techniques reduce reliability loss at runtime by constraining operating points, but they face the challenge of reducing user experience degradation while meeting a lifetime target. In this work we propose a sensor based hierarchical controller for multicore processor DRM, exploiting the major gap between the time scales of workload variations and reliability loss. We improve performance and user experience by locally relaxing reliability-induced operating point constraints, while meeting them over the large time windows relevant for reliability. With respect to the state-of-the-art, our solution guarantees timely execution of 100% of latency-critical applications, and have a 4% performance improvement over the whole lifetime.
Pietro Mercati, Andrea Bartolini, Francesco Paterna, Tajana Simunic Rosing, Luca Benini (2013). Workload and user experience-aware dynamic reliability management in multicore processorsProceedings of the 50th Annual Design Automation Conference on - DAC '13. 2013 IEEE Conference Proceedings [10.1145/2463209.2488735].
Workload and user experience-aware dynamic reliability management in multicore processorsProceedings of the 50th Annual Design Automation Conference on - DAC '13
BARTOLINI, ANDREA;PATERNA, FRANCESCO;BENINI, LUCA
2013
Abstract
Reliability is a major concern for nanoscale CMOS circuits. Degradation phenomena such as Electromigration, Negative Bias Temperature Instability, Time Dependent Dielectric Breakdown worsen with transistor scaling. Dynamic Reliability Management (DRM) techniques reduce reliability loss at runtime by constraining operating points, but they face the challenge of reducing user experience degradation while meeting a lifetime target. In this work we propose a sensor based hierarchical controller for multicore processor DRM, exploiting the major gap between the time scales of workload variations and reliability loss. We improve performance and user experience by locally relaxing reliability-induced operating point constraints, while meeting them over the large time windows relevant for reliability. With respect to the state-of-the-art, our solution guarantees timely execution of 100% of latency-critical applications, and have a 4% performance improvement over the whole lifetime.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.