Power and energy consumption are becoming key challenges for the supercomputers' exascale race. HPC systems' processors waist active power during communication and synchronization among the MPI processes in large-scale HPC applications. However, due to the time scale at which communication happens, transitioning into low-power states while waiting for the completion of each communication may introduce unacceptable overhead. In this article, we present COUNTDOWN, a run-time library for identifying and automatically reducing the power consumption of the CPUs during communication and synchronization. COUNTDOWN saves energy without penalizing the time-to-completion by lowering CPUs power consumption only during idle times for which power state transition overhead is negligible. This is done transparently to the user, without requiring labor-intensive and error-prone application code modifications, nor requiring recompilation of the application. We test our methodology on a production Tier-1 system. For the NAS benchmarks, COUNTDOWN saves between 6 and 50 percent energy, with a time-to-solution penalty lower than 5 percent. In a complete production - Quantum ESPRESSO - for a 3.5K cores run, COUNTDOWN saves 22.36 percent energy, with a performance penalty below 3 percent. Energy saving increases to 37 percent with a performance penalty of 6.38 percent, if the application is executed without communication tuning.

COUNTDOWN: A Run-Time Library for Performance-Neutral Energy Saving in MPI Applications / Cesarini D.; Bartolini A.; Bonfa P.; Cavazzoni C.; Benini L.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - ELETTRONICO. - 70:5(2021), pp. 9095224.682-9095224.695. [10.1109/TC.2020.2995269]

COUNTDOWN: A Run-Time Library for Performance-Neutral Energy Saving in MPI Applications

Bartolini A.;Benini L.
2021

Abstract

Power and energy consumption are becoming key challenges for the supercomputers' exascale race. HPC systems' processors waist active power during communication and synchronization among the MPI processes in large-scale HPC applications. However, due to the time scale at which communication happens, transitioning into low-power states while waiting for the completion of each communication may introduce unacceptable overhead. In this article, we present COUNTDOWN, a run-time library for identifying and automatically reducing the power consumption of the CPUs during communication and synchronization. COUNTDOWN saves energy without penalizing the time-to-completion by lowering CPUs power consumption only during idle times for which power state transition overhead is negligible. This is done transparently to the user, without requiring labor-intensive and error-prone application code modifications, nor requiring recompilation of the application. We test our methodology on a production Tier-1 system. For the NAS benchmarks, COUNTDOWN saves between 6 and 50 percent energy, with a time-to-solution penalty lower than 5 percent. In a complete production - Quantum ESPRESSO - for a 3.5K cores run, COUNTDOWN saves 22.36 percent energy, with a performance penalty below 3 percent. Energy saving increases to 37 percent with a performance penalty of 6.38 percent, if the application is executed without communication tuning.
2021
COUNTDOWN: A Run-Time Library for Performance-Neutral Energy Saving in MPI Applications / Cesarini D.; Bartolini A.; Bonfa P.; Cavazzoni C.; Benini L.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - ELETTRONICO. - 70:5(2021), pp. 9095224.682-9095224.695. [10.1109/TC.2020.2995269]
Cesarini D.; Bartolini A.; Bonfa P.; Cavazzoni C.; Benini L.
File in questo prodotto:
File Dimensione Formato  
countdown post print.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 3.53 MB
Formato Adobe PDF
3.53 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/863161
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 3
social impact