Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case thermal budgeting is becoming unaffordable. Reactive thermal management, which sets temperature thresholds to trigger thermal capping actions, is too “near-sighted”, and it may lead to severe performance degradation and thermal overshoots. More aggressive proactive thermal managements minimize performance penalty with smooth optimal control. These techniques require knowledge of thermal models which have to be accurate and simple to make the controls effective, while keeping their complexity limited. In practice, these models are not provided by manufacturers, and in most cases they strongly depend on the deployment environment. Hence, procedures to automatically derive thermal models in the field are needed. In this paper, we propose a gray-box procedure to learn a compact and physically-consistent model for multicore chips. We leverage the physical-consistency of the proposed model to tame the model complexity and to face large quantization noise in measurements. We exploit Output Error structures along with Levenberg-Marquardt and Least Squares optimization algorithms. We tackle the problem in a real-life contest: we developed a complete infrastructure for model-building and thermal data collection in the Linux environment, and we tested it on an Intel Nehalem-based server CPU.
Beneventi F., Bartolini A., Tilli A., Benini L. (2014). An Effective Gray-Box Identification Procedure for Multicore Thermal Modelling. IEEE TRANSACTIONS ON COMPUTERS, 63(5), 1097-1110 [10.1109/TC.2012.293].
An Effective Gray-Box Identification Procedure for Multicore Thermal Modelling
BENEVENTI, FRANCESCO;BARTOLINI, ANDREA;TILLI, ANDREA;BENINI, LUCA
2014
Abstract
Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case thermal budgeting is becoming unaffordable. Reactive thermal management, which sets temperature thresholds to trigger thermal capping actions, is too “near-sighted”, and it may lead to severe performance degradation and thermal overshoots. More aggressive proactive thermal managements minimize performance penalty with smooth optimal control. These techniques require knowledge of thermal models which have to be accurate and simple to make the controls effective, while keeping their complexity limited. In practice, these models are not provided by manufacturers, and in most cases they strongly depend on the deployment environment. Hence, procedures to automatically derive thermal models in the field are needed. In this paper, we propose a gray-box procedure to learn a compact and physically-consistent model for multicore chips. We leverage the physical-consistency of the proposed model to tame the model complexity and to face large quantization noise in measurements. We exploit Output Error structures along with Levenberg-Marquardt and Least Squares optimization algorithms. We tackle the problem in a real-life contest: we developed a complete infrastructure for model-building and thermal data collection in the Linux environment, and we tested it on an Intel Nehalem-based server CPU.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.