AutoML has witnessed effective applications in the field of supervised learning – mainly in classification tasks – where the goal is to find the best machine-learning pipeline when a ground truth is available. This is not the case for unsupervised tasks that are by nature exploratory and they are performed to unveil hidden insights. Since there is no right result, analyzing different configurations is more important than returning the best-performing one. When it comes to exploratory unsupervised tasks – such as cluster analysis – different facets of the datasets could be interesting for the data scientist; for instance, data items can be effectively grouped together in different subspaces of features. In this paper, AutoClues explores and returns a dashboard of both relevant and diverse clusterings via AutoML and diversification. AutoML ensures that the explored pipelines for cluster analysis (including pre-processing steps) compute good clusterings. Then, diversification selects, out of the explored clusterings, the ones conveying different clues to the data scientists.

Francia M., Giovanelli J., Golfarelli M. (2024). AutoClues: Exploring Clustering Pipelines via AutoML and Diversification. Springer Science and Business Media Deutschland GmbH [10.1007/978-981-97-2242-6_20].

AutoClues: Exploring Clustering Pipelines via AutoML and Diversification

Francia M.;Giovanelli J.;Golfarelli M.
2024

Abstract

AutoML has witnessed effective applications in the field of supervised learning – mainly in classification tasks – where the goal is to find the best machine-learning pipeline when a ground truth is available. This is not the case for unsupervised tasks that are by nature exploratory and they are performed to unveil hidden insights. Since there is no right result, analyzing different configurations is more important than returning the best-performing one. When it comes to exploratory unsupervised tasks – such as cluster analysis – different facets of the datasets could be interesting for the data scientist; for instance, data items can be effectively grouped together in different subspaces of features. In this paper, AutoClues explores and returns a dashboard of both relevant and diverse clusterings via AutoML and diversification. AutoML ensures that the explored pipelines for cluster analysis (including pre-processing steps) compute good clusterings. Then, diversification selects, out of the explored clusterings, the ones conveying different clues to the data scientists.
2024
Advances in Knowledge Discovery and Data Mining. 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2024, Taipei, Taiwan, May 7–10, 2024, Proceedings, Part I
246
258
Francia M., Giovanelli J., Golfarelli M. (2024). AutoClues: Exploring Clustering Pipelines via AutoML and Diversification. Springer Science and Business Media Deutschland GmbH [10.1007/978-981-97-2242-6_20].
Francia M.; Giovanelli J.; Golfarelli M.
File in questo prodotto:
File Dimensione Formato  
_PAKDD__AutoML_clustering.pdf

embargo fino al 24/04/2025

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.31 MB
Formato Adobe PDF
1.31 MB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/969752
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact