Many data scientists are currently pointing out that the amount of Machine Learning (ML) research that will cross into practice will depend, not just on the ability of the specialized algorithms used to scrutinize positive/negative examples, but also on the quality of the data exploited for training those algorithms. Our experience, while training a neural network with a huge dataset comprised of over fifteen million water meter readings, confirms such conjecture. In this paper, we report on the actions we took to extrapolate from that database just those data that could correctly represent the complex statistical phenomenon in play. With an adequate re-organization of those data, we got an interesting, yet controversial, result. On the one hand, we improved the accuracy on the prediction when a water meter fails/needs disassembly based on a history of water consumption measurements, thus making smarter a meter maintenance process; on the other hand, all this came with the paradox of a (statistical) transformation of the initial dataset: while we alleviate a problem with a restructured and better interpretable data model, we simultaneously change the replicated form of those data.

Roccetti, M. (2019). A paradox in ML design: Less data for a smarter water metering cognification experience. Nw York : ACM [10.1145/3342428.3342685].

A paradox in ML design: Less data for a smarter water metering cognification experience

Roccetti M.
;
Delnevo G.;Casini L.;Zagni N.;Cappiello G.
2019

Abstract

Many data scientists are currently pointing out that the amount of Machine Learning (ML) research that will cross into practice will depend, not just on the ability of the specialized algorithms used to scrutinize positive/negative examples, but also on the quality of the data exploited for training those algorithms. Our experience, while training a neural network with a huge dataset comprised of over fifteen million water meter readings, confirms such conjecture. In this paper, we report on the actions we took to extrapolate from that database just those data that could correctly represent the complex statistical phenomenon in play. With an adequate re-organization of those data, we got an interesting, yet controversial, result. On the one hand, we improved the accuracy on the prediction when a water meter fails/needs disassembly based on a history of water consumption measurements, thus making smarter a meter maintenance process; on the other hand, all this came with the paradox of a (statistical) transformation of the initial dataset: while we alleviate a problem with a restructured and better interpretable data model, we simultaneously change the replicated form of those data.
2019
ACM International Conference Proceeding Series
201
206
Roccetti, M. (2019). A paradox in ML design: Less data for a smarter water metering cognification experience. Nw York : ACM [10.1145/3342428.3342685].
Roccetti, M., Delnevo, G., Casini, L., Zagni, N., Cappiello, G.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/701775
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 1
social impact