Automated and data-driven methodologies are being introduced to assist system administrators in managing increasingly complex modern HPC systems. Anomaly detection (AD) is an integral part of improving the overall availability as it eases the system administrators' burden and reduces the time between an anomaly and its resolution. This work improves upon the current state-of-the-art (SoA) AD model by considering temporal dependencies in the data and including long-short term memory cells in the architecture of the AD model. The proposed model is evaluated on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA consisting of 985 nodes). The proposed model achieves an area under the curve (AUC) of 0.758, improving upon the state-of-the-art approach that achieves an AUC of 0.747.

Molan M., Borghesi A., Benini L., Bartolini A. (2022). Semi-supervised anomaly detection on a Tier-0 HPC system [10.1145/3528416.3530867].

Semi-supervised anomaly detection on a Tier-0 HPC system

Molan M.;Borghesi A.;Benini L.;Bartolini A.
2022

Abstract

Automated and data-driven methodologies are being introduced to assist system administrators in managing increasingly complex modern HPC systems. Anomaly detection (AD) is an integral part of improving the overall availability as it eases the system administrators' burden and reduces the time between an anomaly and its resolution. This work improves upon the current state-of-the-art (SoA) AD model by considering temporal dependencies in the data and including long-short term memory cells in the architecture of the AD model. The proposed model is evaluated on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA consisting of 985 nodes). The proposed model achieves an area under the curve (AUC) of 0.758, improving upon the state-of-the-art approach that achieves an AUC of 0.747.
2022
CF '22: Proceedings of the 19th ACM International Conference on Computing Frontiers
203
204
Molan M., Borghesi A., Benini L., Bartolini A. (2022). Semi-supervised anomaly detection on a Tier-0 HPC system [10.1145/3528416.3530867].
Molan M.; Borghesi A.; Benini L.; Bartolini A.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/901546
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact