As result of technology scaling, single-chip multi-core power density increases and its spatial and temporal workload variation leads to temperature hot-spots, which may cause non-uniform ageing and accelerated chip failure. These critical issues can be tackled by closed-loop thermal and reliability management policies. Model predictive controllers (MPC) outperform classic feedback controllers since they are capable of minimizing performance loss while enforcing safe working temperature. Unfortunately, MPC controllers rely on a-priori knowledge of thermal models and their complexity exponentially grows with the number of controlled cores. In this paper we present a scalable, fully-distributed, energy-aware thermal management solution for single-chip multi-core platforms. The model-predictive controller complexity is drastically reduced by splitting it in a set of simpler interacting controllers, each one allocated to a core in the system. Locally, each node selects the optimal frequency to meet temperature constraints while minimizing the performance penalty and system energy. Comparable performance with state-of-the-art MPC controllers is achieved by letting controllers exchange a limited amount of information at run-time on a neighbourhood basis. In addition, we address model uncertainty by supporting learning of the thermal model with a novel distributed self-calibration approach that matches well the controller architecture.

Thermal and Energy Management of High-Performance Multicores: Distributed and Self-Calibrating Model-Predictive Controller

BARTOLINI, ANDREA;CACCIARI, MATTEO;TILLI, ANDREA;BENINI, LUCA
2013

Abstract

As result of technology scaling, single-chip multi-core power density increases and its spatial and temporal workload variation leads to temperature hot-spots, which may cause non-uniform ageing and accelerated chip failure. These critical issues can be tackled by closed-loop thermal and reliability management policies. Model predictive controllers (MPC) outperform classic feedback controllers since they are capable of minimizing performance loss while enforcing safe working temperature. Unfortunately, MPC controllers rely on a-priori knowledge of thermal models and their complexity exponentially grows with the number of controlled cores. In this paper we present a scalable, fully-distributed, energy-aware thermal management solution for single-chip multi-core platforms. The model-predictive controller complexity is drastically reduced by splitting it in a set of simpler interacting controllers, each one allocated to a core in the system. Locally, each node selects the optimal frequency to meet temperature constraints while minimizing the performance penalty and system energy. Comparable performance with state-of-the-art MPC controllers is achieved by letting controllers exchange a limited amount of information at run-time on a neighbourhood basis. In addition, we address model uncertainty by supporting learning of the thermal model with a novel distributed self-calibration approach that matches well the controller architecture.
Bartolini A.; Cacciari M. ; Tilli A. ; Benini L.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/132962
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 90
  • ???jsp.display-item.citation.isi??? 80
social impact