We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing users to create and monitor a variety of highly-complex and diverse fault conditions in HPC systems that would be difficult to recreate in practice. FINJ is suitable for experiments involving many, potentially interacting nodes, making it a very versatile design and evaluation tool.

FINJ: A Fault Injection Tool for HPC Systems / Alessio Netti; Zeynep Kiziltan; Ozalp Babaoglu; Alina Sirbu; Andrea Bartolini; Andrea Borghesi. - STAMPA. - 11339:(2019), pp. 800-812. (Intervento presentato al convegno Euro-Par 2018: European Conference on Parallel Processing Workshops tenutosi a Turin, Italy nel August 27-28, 2018) [10.1007/978-3-030-10549-5_62].

FINJ: A Fault Injection Tool for HPC Systems

Zeynep Kiziltan;Ozalp Babaoglu;Andrea Bartolini;Andrea Borghesi
2019

Abstract

We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing users to create and monitor a variety of highly-complex and diverse fault conditions in HPC systems that would be difficult to recreate in practice. FINJ is suitable for experiments involving many, potentially interacting nodes, making it a very versatile design and evaluation tool.
2019
Euro-Par 2018: Parallel Processing Workshops
800
812
FINJ: A Fault Injection Tool for HPC Systems / Alessio Netti; Zeynep Kiziltan; Ozalp Babaoglu; Alina Sirbu; Andrea Bartolini; Andrea Borghesi. - STAMPA. - 11339:(2019), pp. 800-812. (Intervento presentato al convegno Euro-Par 2018: European Conference on Parallel Processing Workshops tenutosi a Turin, Italy nel August 27-28, 2018) [10.1007/978-3-030-10549-5_62].
Alessio Netti; Zeynep Kiziltan; Ozalp Babaoglu; Alina Sirbu; Andrea Bartolini; Andrea Borghesi
File in questo prodotto:
File Dimensione Formato  
Netti_Europar2018_FINJ_postprint.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 617.46 kB
Formato Adobe PDF
617.46 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/668910
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
social impact