CRIS Current Research Information System

Supervised Machine Learning (ML) requires that smart algorithms scrutinize a very large number of labeled samples before they can make right predictions. And this is not always true either. In our experience, in fact, a neural network trained with a huge database comprised of over fifteen million water meter readings had essentially failed to predict when a meter would malfunction/need disassembly based on a history of water consumption measurements. With a second step, we developed a methodology, based on the enforcement of a specialized data semantics, that allowed us to extract only those samples for training that were not noised by data impurities. With this methodology, we re-trained the neural network up to a prediction accuracy of over 80%. Yet, we simultaneously realized that the new training dataset was significantly different from the initial one in statistical terms, and much smaller, as well. We had reached a sort of paradox: We had alleviated the initial problem with a better interpretable model, but we had changed the replicated form of the initial data. To reconcile that paradox, we further enhanced our data semantics with the contribution of field experts. This has finally led to the extrapolation of a training dataset truly representative of regular/defective water meters and able to describe the underlying statistical phenomenon, while still providing an excellent prediction accuracy of the resulting classifier. At the end of this path, the lesson we have learnt is that a human-in-the-loop approach may significantly help to clean and re-organize noised datasets for an empowered ML design experience.

Roccetti, M., Delnevo, G., Casini, L., Salomoni, P. (2020). A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis. MOBILE NETWORKS AND APPLICATIONS, 2020(25), 1075-1083 [10.1007/s11036-020-01530-6].

A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis

Roccetti M.;Delnevo G.;Casini L.;Salomoni P.

2020

Abstract

Supervised Machine Learning (ML) requires that smart algorithms scrutinize a very large number of labeled samples before they can make right predictions. And this is not always true either. In our experience, in fact, a neural network trained with a huge database comprised of over fifteen million water meter readings had essentially failed to predict when a meter would malfunction/need disassembly based on a history of water consumption measurements. With a second step, we developed a methodology, based on the enforcement of a specialized data semantics, that allowed us to extract only those samples for training that were not noised by data impurities. With this methodology, we re-trained the neural network up to a prediction accuracy of over 80%. Yet, we simultaneously realized that the new training dataset was significantly different from the initial one in statistical terms, and much smaller, as well. We had reached a sort of paradox: We had alleviated the initial problem with a better interpretable model, but we had changed the replicated form of the initial data. To reconcile that paradox, we further enhanced our data semantics with the contribution of field experts. This has finally led to the extrapolation of a training dataset truly representative of regular/defective water meters and able to describe the underlying statistical phenomenon, while still providing an excellent prediction accuracy of the resulting classifier. At the end of this path, the lesson we have learnt is that a human-in-the-loop approach may significantly help to clean and re-organize noised datasets for an empowered ML design experience.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista
	
				MOBILE NETWORKS AND APPLICATIONS
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11036-020-01530-6
			
	Citazione
	
				Roccetti, M., Delnevo, G., Casini, L., Salomoni, P. (2020). A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis. MOBILE NETWORKS AND APPLICATIONS, 2020(25), 1075-1083 [10.1007/s11036-020-01530-6].
			
	Tutti gli autori
	
						Roccetti, M.; Delnevo, G.; Casini, L.; Salomoni, P.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
a-cautionary-tale.pdf accesso aperto Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per accesso libero gratuito Dimensione 533.17 kB Formato Adobe PDF Visualizza/Apri	533.17 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/753290

Citazioni

ND

45

32

social impact