Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore handle and analyze a massive amount of data coming from the HPC monitoring infrastructure. This becomes rapidly a 'big data' scale problem. The common approach where measurements are first stored in large databases and then processed is no more affordable due to the increasingly storage costs and lack of real-time support. Nowadays instead, cloud-based machine learning techniques aim to build on-line models using real-time approaches such as 'stream processing' and 'in-memory' computing, that avoid storage costs and enable fastdata processing. Moreover, the fast delivery and adaptation of the models to the quick data variations, make the decision stage of the optimization loop more effective and reliable. In this paper we leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components. We then show how state-of-the art tools for big data computing and analysis, such as Apache Spark, can be used to manage the huge amount of data delivered by the monitoring layer and to build adaptive models in real-time using on-line machine learning techniques.

Continuous Learning of HPC Infrastructure Models using Big Data Analytics and In-Memory processing Tools / Beneventi, Francesco; Bartolini, Andrea; Cavazzoni, Carlo; Benini, Luca. - ELETTRONICO. - (2017), pp. 7927143.1038-7927143.1043. (Intervento presentato al convegno 20th Design, Automation and Test in Europe, DATE 2017 tenutosi a SwissTech Convention CenterSwisstech, Lausanne; Switzerland; nel 27 - 31 March 2017) [10.23919/DATE.2017.7927143].

Continuous Learning of HPC Infrastructure Models using Big Data Analytics and In-Memory processing Tools

Beneventi, Francesco
;
Bartolini, Andrea
;
Benini, Luca
2017

Abstract

Exascale computing represents the next leap in the HPC race. Reaching this level of performance is subject to several engineering challenges such as energy consumption, equipment-cooling, reliability and massive parallelism. Model-based optimization is an essential tool in the design process and control of energy efficient, reliable and thermally constrained systems. However, in the Exascale domain, model learning techniques tailored to the specific supercomputer require real measurements and must therefore handle and analyze a massive amount of data coming from the HPC monitoring infrastructure. This becomes rapidly a 'big data' scale problem. The common approach where measurements are first stored in large databases and then processed is no more affordable due to the increasingly storage costs and lack of real-time support. Nowadays instead, cloud-based machine learning techniques aim to build on-line models using real-time approaches such as 'stream processing' and 'in-memory' computing, that avoid storage costs and enable fastdata processing. Moreover, the fast delivery and adaptation of the models to the quick data variations, make the decision stage of the optimization loop more effective and reliable. In this paper we leverage scalable, lightweight and flexible IoT technologies, such as the MQTT protocol, to build a highly scalable HPC monitoring infrastructure able to handle the massive sensor data produced by next-gen HPC components. We then show how state-of-the art tools for big data computing and analysis, such as Apache Spark, can be used to manage the huge amount of data delivered by the monitoring layer and to build adaptive models in real-time using on-line machine learning techniques.
2017
Proceedings of the 2017 Design, Automation and Test in Europe, DATE 2017
1038
1043
Continuous Learning of HPC Infrastructure Models using Big Data Analytics and In-Memory processing Tools / Beneventi, Francesco; Bartolini, Andrea; Cavazzoni, Carlo; Benini, Luca. - ELETTRONICO. - (2017), pp. 7927143.1038-7927143.1043. (Intervento presentato al convegno 20th Design, Automation and Test in Europe, DATE 2017 tenutosi a SwissTech Convention CenterSwisstech, Lausanne; Switzerland; nel 27 - 31 March 2017) [10.23919/DATE.2017.7927143].
Beneventi, Francesco; Bartolini, Andrea; Cavazzoni, Carlo; Benini, Luca
File in questo prodotto:
File Dimensione Formato  
2016_DATE_Beneventi_FP.pdf

accesso aperto

Descrizione: Postprint
Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/613826
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 49
  • ???jsp.display-item.citation.isi??? 35
social impact