AutoML has witnessed effective applications in the field of supervised learning – mainly in classification tasks – where the goal is to find the best machine-learning pipeline when a ground truth is available. This is not the case for unsupervised tasks that are by nature exploratory and they are performed to unveil hidden insights. Since there is no right result, analyzing different configurations is more important than returning the best-performing one. When it comes to exploratory unsupervised tasks – such as cluster analysis – different facets of the datasets could be interesting for the data scientist; for instance, data items can be effectively grouped together in different subspaces of features. In this paper, AutoClues explores and returns a dashboard of both relevant and diverse clusterings via AutoML and diversification. AutoML ensures that the explored pipelines for cluster analysis (including pre-processing steps) compute good clusterings. Then, diversification selects, out of the explored clusterings, the ones conveying different clues to the data scientists.
Francia M., Giovanelli J., Golfarelli M. (2024). AutoClues: Exploring Clustering Pipelines via AutoML and Diversification. Springer Science and Business Media Deutschland GmbH [10.1007/978-981-97-2242-6_20].
AutoClues: Exploring Clustering Pipelines via AutoML and Diversification
Francia M.;Giovanelli J.;Golfarelli M.
2024
Abstract
AutoML has witnessed effective applications in the field of supervised learning – mainly in classification tasks – where the goal is to find the best machine-learning pipeline when a ground truth is available. This is not the case for unsupervised tasks that are by nature exploratory and they are performed to unveil hidden insights. Since there is no right result, analyzing different configurations is more important than returning the best-performing one. When it comes to exploratory unsupervised tasks – such as cluster analysis – different facets of the datasets could be interesting for the data scientist; for instance, data items can be effectively grouped together in different subspaces of features. In this paper, AutoClues explores and returns a dashboard of both relevant and diverse clusterings via AutoML and diversification. AutoML ensures that the explored pipelines for cluster analysis (including pre-processing steps) compute good clusterings. Then, diversification selects, out of the explored clusterings, the ones conveying different clues to the data scientists.File | Dimensione | Formato | |
---|---|---|---|
_PAKDD__AutoML_clustering.pdf
embargo fino al 24/04/2025
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.