In the last decades, High Performance Computing (HPC) systems have accelerated scientific discoveries and innovations across different domains, from epidemic studies to climate science. For sustainable development of HPC systems, it is fundamental to address their environmental impact regarding carbon footprint emission and energy requirement, while ensuring high system throughput. Analyzing and predicting HPC job execution characteristics is instrumental in developing workload management strategies to simultaneously optimize the system throughput and minimize the environmental impact. However, model development for accurate predictions is hindered by lack of voluminous public datasets. In this paper, we present F-DATA, a public dataset containing the information of around 24 million jobs executed on Fugaku, the most powerful supercomputer during the data collection phase. The data contains an extensive set of features, allowing for a multitude of job characteristics prediction. The sensitive job data appears both in anonymized and irreversibly encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes without violating privacy concerns.

Antici, F., Bartolini, A., Domke, J., Kiziltan, Z., Yamamoto, K. (2025). F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems. SCIENTIFIC DATA, 12(1), 1-13 [10.1038/s41597-025-05633-1].

F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems

Antici F.
;
Bartolini A.;Kiziltan Z.;
2025

Abstract

In the last decades, High Performance Computing (HPC) systems have accelerated scientific discoveries and innovations across different domains, from epidemic studies to climate science. For sustainable development of HPC systems, it is fundamental to address their environmental impact regarding carbon footprint emission and energy requirement, while ensuring high system throughput. Analyzing and predicting HPC job execution characteristics is instrumental in developing workload management strategies to simultaneously optimize the system throughput and minimize the environmental impact. However, model development for accurate predictions is hindered by lack of voluminous public datasets. In this paper, we present F-DATA, a public dataset containing the information of around 24 million jobs executed on Fugaku, the most powerful supercomputer during the data collection phase. The data contains an extensive set of features, allowing for a multitude of job characteristics prediction. The sensitive job data appears both in anonymized and irreversibly encoded versions. The encoding is based on a Natural Language Processing model and retains sensitive but useful job information for prediction purposes without violating privacy concerns.
2025
Antici, F., Bartolini, A., Domke, J., Kiziltan, Z., Yamamoto, K. (2025). F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems. SCIENTIFIC DATA, 12(1), 1-13 [10.1038/s41597-025-05633-1].
Antici, F.; Bartolini, A.; Domke, J.; Kiziltan, Z.; Yamamoto, K.
File in questo prodotto:
File Dimensione Formato  
s41597-025-05633-1.pdf

accesso aperto

Descrizione: v. editoriale
Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Creative commons
Dimensione 1.81 MB
Formato Adobe PDF
1.81 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1030290
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact