As side effects of the end of Dennard’s scaling, power and thermal technological walls stand in front of the evolution of supercomputers towards the exaflops era. Energy and temperature walls are big challenges to face for assuring a constant grow of performance in future. New generation architectures for HPC systems implement HW and SW components to address energy and thermal issues for increasing power and efficient computing in scientific workload. In thermal-bound HPC machines, workload-aware runtimes can leverage hardware knobs to guarantee the best operating point in term of performance and power saving without violating thermal constraints. In this paper, we present an integer-linear programming formulation for job mapping and frequency selection for thermal-bound HPC nodes. We use a fast solver and workload traces extracted from a real supercomputer to test our methodology. Our runtime is integrated into the MPI library, and it is capable of assigning high-performance cores to performance-critical processes. Critical processes are identified at execution time through a mathematical formulation, which relies on the characterization of the application workload and on the global synchronization barriers. We demonstrate that by combining long and short horizon predictions with information on the critical processes retrieved from the programming model, we can drastically improve the performance of the target application w.r.t. state-of-the-art DTM solutions.

Cesarini D., Bartolini A., Benini L. (2019). Modeling and Evaluation of Application-Aware Dynamic Thermal Control in HPC Nodes. Springer New York LLC [10.1007/978-3-030-15663-3_10].

Modeling and Evaluation of Application-Aware Dynamic Thermal Control in HPC Nodes

Bartolini A.;Benini L.
2019

Abstract

As side effects of the end of Dennard’s scaling, power and thermal technological walls stand in front of the evolution of supercomputers towards the exaflops era. Energy and temperature walls are big challenges to face for assuring a constant grow of performance in future. New generation architectures for HPC systems implement HW and SW components to address energy and thermal issues for increasing power and efficient computing in scientific workload. In thermal-bound HPC machines, workload-aware runtimes can leverage hardware knobs to guarantee the best operating point in term of performance and power saving without violating thermal constraints. In this paper, we present an integer-linear programming formulation for job mapping and frequency selection for thermal-bound HPC nodes. We use a fast solver and workload traces extracted from a real supercomputer to test our methodology. Our runtime is integrated into the MPI library, and it is capable of assigning high-performance cores to performance-critical processes. Critical processes are identified at execution time through a mathematical formulation, which relies on the characterization of the application workload and on the global synchronization barriers. We demonstrate that by combining long and short horizon predictions with information on the critical processes retrieved from the programming model, we can drastically improve the performance of the target application w.r.t. state-of-the-art DTM solutions.
2019
IFIP Advances in Information and Communication Technology. VLSI-SoC: Opportunities and Challenges Beyond the Internet of Things
198
219
Cesarini D., Bartolini A., Benini L. (2019). Modeling and Evaluation of Application-Aware Dynamic Thermal Control in HPC Nodes. Springer New York LLC [10.1007/978-3-030-15663-3_10].
Cesarini D.; Bartolini A.; Benini L.
File in questo prodotto:
File Dimensione Formato  
VLSISOC.pdf

Open Access dal 02/05/2021

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.4 MB
Formato Adobe PDF
1.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/788628
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact