Dimensionality Reduction and the Strange Case of Categorical Data for Predicting Defective Water Meter Devices

Roccetti, M.; Casini, L.; Delnevo, G.; Bonfante, S.

doi:10.1007/978-3-030-55307-4_24

Further to an experiment conducted with a deep learning (DL) model, tailored to predict whether a water meter device would fail with passage of time, we came across a very strange case, occurring when we tried to strengthen the training activity of our classifier by using, besides the numerical measurements of consumed water, also other contextual available information, of categorical type. Surprisingly, that further categorical information did not improve the prediction accuracy, which instead fell down, sensibly. Recognized the problem as a case of an excessive increase of the dimensions of the space of data under observation, with a correspondent loss of statistical significance, we changed the training strategy. Observing that every categorical variable followed a quasi-Pareto distribution, we re-trained our DL models, for each single categorical variable, only on that fraction of meter devices (and corresponding measurements of consumed water) that exhibited the most frequent qualitative values for that categorical variable. This new strategy yielded a prediction accuracy level never reached before, amounting to a value of 87–88% on average

M. Roccetti, L.C. (2021). Dimensionality Reduction and the Strange Case of Categorical Data for Predicting Defective Water Meter Devices. Cham Switzerland : Springer Nature [10.1007/978-3-030-55307-4_24].

Dimensionality Reduction and the Strange Case of Categorical Data for Predicting Defective Water Meter Devices

M. Roccetti;L. Casini;G. Delnevo;S. Bonfante

2021

Abstract

Further to an experiment conducted with a deep learning (DL) model, tailored to predict whether a water meter device would fail with passage of time, we came across a very strange case, occurring when we tried to strengthen the training activity of our classifier by using, besides the numerical measurements of consumed water, also other contextual available information, of categorical type. Surprisingly, that further categorical information did not improve the prediction accuracy, which instead fell down, sensibly. Recognized the problem as a case of an excessive increase of the dimensions of the space of data under observation, with a correspondent loss of statistical significance, we changed the training strategy. Observing that every categorical variable followed a quasi-Pareto distribution, we re-trained our DL models, for each single categorical variable, only on that fraction of meter devices (and corresponding measurements of consumed water) that exhibited the most frequent qualitative values for that categorical variable. This new strategy yielded a prediction accuracy level never reached before, amounting to a value of 87–88% on average

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Titolo del volume
	
				Human Interaction, Emerging Technologies and Future Applications III
			
	Pagina iniziale
	
				155
			
	Pagina finale
	
				159
			
	Collana/Serie
	
				ADVANCES IN INTELLIGENT SYSTEMS AND COMPUTING
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-55307-4_24
			
	Citazione
	
				M. Roccetti, L.C. (2021). Dimensionality Reduction and the Strange Case of Categorical Data for Predicting Defective Water Meter Devices. Cham Switzerland : Springer Nature [10.1007/978-3-030-55307-4_24].
			
	Tutti gli autori
	
						M. Roccetti, L. Casini, G. Delnevo, S.Bonfante
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/769468

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

1

ND

ND

CRIS Current Research Information System