As silicon integration technology pushes toward atomic dimensions, errors due to static and dynamic variability are an increasing concern. To avoid such errors, designers often turn to "guardband" restrictions on the operating frequency and voltage. If guardbands are too conservative, they limit performance and waste energy, but less conservative guardbands risk moving the system closer to its Critical Operating Point (COP), a frequency-voltage pair that, if surpassed, causes massive instruction failures. In this paper, we propose a novel scheme that allows to dynamically adjust to an evolving COP and operate at highly reduced margins, while guaranteeing forward progress. Specifically, our scheme dynamically monitors the platform and adaptively adjusts to the COP among multiple cores, using lightweight checkpointing and roll-back mechanisms adopted from Hardware Transactional Memory (HTM) for error recovery. Experiments demonstrate that our technique is particularly effective in saving energy while also offering safe execution guarantees. To the best of our knowledge, this work is the first to describe a full-fledged HTM implementation for errorresilient and energy-efficient MPSoC execution.

Papagiannopoulou, D., Benini, L., Marongiu, A., Herlihy, M., Moreshet, T., Bahar, I. (2015). Playing with fire: Transactional memory revisited for error-resilient and energy-efficient MPSOC execution. Association for Computing Machinery [10.1145/2742060.2742090].

Playing with fire: Transactional memory revisited for error-resilient and energy-efficient MPSOC execution

BENINI, LUCA;MARONGIU, ANDREA;
2015

Abstract

As silicon integration technology pushes toward atomic dimensions, errors due to static and dynamic variability are an increasing concern. To avoid such errors, designers often turn to "guardband" restrictions on the operating frequency and voltage. If guardbands are too conservative, they limit performance and waste energy, but less conservative guardbands risk moving the system closer to its Critical Operating Point (COP), a frequency-voltage pair that, if surpassed, causes massive instruction failures. In this paper, we propose a novel scheme that allows to dynamically adjust to an evolving COP and operate at highly reduced margins, while guaranteeing forward progress. Specifically, our scheme dynamically monitors the platform and adaptively adjusts to the COP among multiple cores, using lightweight checkpointing and roll-back mechanisms adopted from Hardware Transactional Memory (HTM) for error recovery. Experiments demonstrate that our technique is particularly effective in saving energy while also offering safe execution guarantees. To the best of our knowledge, this work is the first to describe a full-fledged HTM implementation for errorresilient and energy-efficient MPSoC execution.
2015
Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
9
14
Papagiannopoulou, D., Benini, L., Marongiu, A., Herlihy, M., Moreshet, T., Bahar, I. (2015). Playing with fire: Transactional memory revisited for error-resilient and energy-efficient MPSOC execution. Association for Computing Machinery [10.1145/2742060.2742090].
Papagiannopoulou, Dimitra; Benini, Luca; Marongiu, Andrea; Herlihy, Maurice; Moreshet, Tali; Bahar, Iris
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/545253
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact