Datacenters play a vital role in today's society. At large, a datacenter room is a complex controlled environment composed of thousands of computing nodes, which consume kW of power. To dissipate the power, forced air/liquid flow is employed, with a cost of millions of euros per year. Reducing this cost involves using free-cooling and average case design, which can create a cooling shortage and thermal hazards. When a thermal hazard happens, the system administrators and the facility manager must stop the production to avoid IT equipment damage and wear-out. In this paper, we study the thermal hazards signatures on a Tier-0 datacenter room's monitored data during a full year of production. We define a set of rules for detecting the thermal hazards based on the inlet and outlet temperature of all nodes of a room. We then propose a custom Temporal Convolutional Network (TCN) to predict the hazards in advance. The results show that our TCN can predict the thermal hazards with an Fl-score of 0.98 for a randomly sampled test set. When causality is enforced between the training and validation set the F1-score drops to 0.74, demanding for an in-place online re-training of the network, which motivates further research in this context.

Seyedkazemi Ardebili M., Zanghieri M., Burrello A., Beneventi F., Acquaviva A., Benini L., et al. (2021). Prediction of Thermal Hazards in a Real Datacenter Room Using Temporal Convolutional Networks. Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE51398.2021.9474116].

Prediction of Thermal Hazards in a Real Datacenter Room Using Temporal Convolutional Networks

Seyedkazemi Ardebili M.;Zanghieri M.;Burrello A.;Beneventi F.;Acquaviva A.;Benini L.;Bartolini A.
2021

Abstract

Datacenters play a vital role in today's society. At large, a datacenter room is a complex controlled environment composed of thousands of computing nodes, which consume kW of power. To dissipate the power, forced air/liquid flow is employed, with a cost of millions of euros per year. Reducing this cost involves using free-cooling and average case design, which can create a cooling shortage and thermal hazards. When a thermal hazard happens, the system administrators and the facility manager must stop the production to avoid IT equipment damage and wear-out. In this paper, we study the thermal hazards signatures on a Tier-0 datacenter room's monitored data during a full year of production. We define a set of rules for detecting the thermal hazards based on the inlet and outlet temperature of all nodes of a room. We then propose a custom Temporal Convolutional Network (TCN) to predict the hazards in advance. The results show that our TCN can predict the thermal hazards with an Fl-score of 0.98 for a randomly sampled test set. When causality is enforced between the training and validation set the F1-score drops to 0.74, demanding for an in-place online re-training of the network, which motivates further research in this context.
2021
Proceedings -Design, Automation and Test in Europe, DATE
1256
1259
Seyedkazemi Ardebili M., Zanghieri M., Burrello A., Beneventi F., Acquaviva A., Benini L., et al. (2021). Prediction of Thermal Hazards in a Real Datacenter Room Using Temporal Convolutional Networks. Institute of Electrical and Electronics Engineers Inc. [10.23919/DATE51398.2021.9474116].
Seyedkazemi Ardebili M.; Zanghieri M.; Burrello A.; Beneventi F.; Acquaviva A.; Benini L.; Bartolini A.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/851592
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact