It is fundamental to design accurate workload power prediction techniques to address environmental sustainability challenges in modern high-performance computing (HPC) systems. While existing Machine Learning (ML) approaches are effective, they retain some limitations in production environments. To address these, we introduce UoPC, a user-based online framework for predicting job power consumption in HPC systems. UoPC leverages ML-based predictive models tailored for individual users, eliminating the need for voluminous data and training. It offers a user-friendly Python implementation suitable for both end-user usage and integration into workload management systems. We evaluate UoPC on more than 1.3 million jobs executed on Fugaku, a supercomputer hosted at RIKEN, demonstrating its effectiveness. It achieves only a 10% prediction error, with minimal overhead on the system operations. By employing a k−nearest neighbours (KNN) prediction model augmented with Natural Language Processing (NLP), UoPC streamlines prediction processes for newly submitted jobs. It requires only limited historical data, making it practical for diverse high-performance computing environments and workloads.
Antici, F., Borghesi, A., Domke, J., Kiziltan, Z. (2025). UoPC: A User-Based Online Framework to Predict Job Power Consumption in HPC Systems.
UoPC: A User-Based Online Framework to Predict Job Power Consumption in HPC Systems
Antici F;Borghesi A;Kiziltan Z
2025
Abstract
It is fundamental to design accurate workload power prediction techniques to address environmental sustainability challenges in modern high-performance computing (HPC) systems. While existing Machine Learning (ML) approaches are effective, they retain some limitations in production environments. To address these, we introduce UoPC, a user-based online framework for predicting job power consumption in HPC systems. UoPC leverages ML-based predictive models tailored for individual users, eliminating the need for voluminous data and training. It offers a user-friendly Python implementation suitable for both end-user usage and integration into workload management systems. We evaluate UoPC on more than 1.3 million jobs executed on Fugaku, a supercomputer hosted at RIKEN, demonstrating its effectiveness. It achieves only a 10% prediction error, with minimal overhead on the system operations. By employing a k−nearest neighbours (KNN) prediction model augmented with Natural Language Processing (NLP), UoPC streamlines prediction processes for newly submitted jobs. It requires only limited historical data, making it practical for diverse high-performance computing environments and workloads.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


