CRIS Current Research Information System

Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on categorical data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a Pareto analysis. In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the Pareto rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87-90%, even in the presence of categorical descriptors.

An alternative approach to dimension reduction for pareto distributed data: a case study / Roccetti, Marco; Delnevo, Giovanni; Casini, Luca; Mirri, Silvia. - In: JOURNAL OF BIG DATA. - ISSN 2196-1115. - ELETTRONICO. - 8:(2021), pp. 39.1-39.23. [10.1186/s40537-021-00428-8]

An alternative approach to dimension reduction for pareto distributed data: a case study

Roccetti, Marco;Delnevo, Giovanni;Casini, Luca;Mirri, Silvia

2021

Abstract

Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on categorical data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a Pareto analysis. In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the Pareto rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87-90%, even in the presence of categorical descriptors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2021
		
	Rivista
	
			JOURNAL OF BIG DATA
		
	Codice DOI
	
			https://dx.doi.org/10.1186/s40537-021-00428-8
		
	Citazione
	
			An alternative approach to dimension reduction for pareto distributed data: a case study / Roccetti, Marco; Delnevo, Giovanni; Casini, Luca; Mirri, Silvia. - In: JOURNAL OF BIG DATA. - ISSN 2196-1115. - ELETTRONICO. - 8:(2021), pp. 39.1-39.23. [10.1186/s40537-021-00428-8]
An alternative approach to dimension reduction for pareto distributed data: a case study / Roccetti, Marco; Delnevo, Giovanni; Casini, Luca; Mirri, Silvia. - In: JOURNAL OF BIG DATA. - ISSN 2196-1115. - ELETTRONICO. - 8:(2021), pp. 39.1-39.23. [10.1186/s40537-021-00428-8]
		
	Tutti gli autori
	
			Roccetti, Marco; Delnevo, Giovanni; Casini, Luca; Mirri, Silvia
		
	Appare nelle tipologie:
	
			1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s40537-021-00428-8.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.6 MB Formato Adobe PDF Visualizza/Apri	1.6 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/814932

Citazioni

5

30

22

social impact