Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize Generalized Linear Latent Variable Models and Deep Gaussian Mixture Models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.

Fuchs, R., Pommeret, D., Viroli, C. (2022). Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 16(1 (March)), 31-53 [10.1007/s11634-021-00466-3].

Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets

Viroli, Cinzia
2022

Abstract

Clustering mixed data presents numerous challenges inherent to the very heterogeneous nature of the variables. A clustering algorithm should be able, despite of this heterogeneity, to extract discriminant pieces of information from the variables in order to design groups. In this work we introduce a multilayer architecture model-based clustering method called Mixed Deep Gaussian Mixture Model that can be viewed as an automatic way to merge the clustering performed separately on continuous and non-continuous data. This architecture is flexible and can be adapted to mixed as well as to continuous or non-continuous data. In this sense we generalize Generalized Linear Latent Variable Models and Deep Gaussian Mixture Models. We also design a new initialisation strategy and a data driven method that selects the best specification of the model and the optimal number of clusters for a given dataset. Besides, our model provides continuous low-dimensional representations of the data which can be a useful tool to visualize mixed datasets. Finally, we validate the performance of our approach comparing its results with state-of-the-art mixed data clustering models over several commonly used datasets.
2022
Fuchs, R., Pommeret, D., Viroli, C. (2022). Mixed Deep Gaussian Mixture Model: a clustering model for mixed datasets. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 16(1 (March)), 31-53 [10.1007/s11634-021-00466-3].
Fuchs, Robin; Pommeret, Denys; Viroli, Cinzia
File in questo prodotto:
File Dimensione Formato  
MDGMM_ADAC.pdf

Open Access dal 01/11/2022

Descrizione: AAM
Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Altra tipologia di licenza compatibile con Open Access
Dimensione 962.11 kB
Formato Adobe PDF
962.11 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/840189
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 5
social impact