In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to address this, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index (which is expected to favour model-based clustering) and Average Silhouette Width index (which is expected to favour distance-based clustering) in a fairly wide simulation study. Visualization of the clustering outcomes is obtained with a special use of Multidimensional Scaling.

Laura Anderlucci, Christian Hennig (2012). Clustering of categorical data: a comparison of different approaches. QUADERNI DI STATISTICA, 14, 1-4.

Clustering of categorical data: a comparison of different approaches

ANDERLUCCI, LAURA;HENNIG, CHRISTIAN MARTIN
2012

Abstract

In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to address this, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index (which is expected to favour model-based clustering) and Average Silhouette Width index (which is expected to favour distance-based clustering) in a fairly wide simulation study. Visualization of the clustering outcomes is obtained with a special use of Multidimensional Scaling.
2012
Laura Anderlucci, Christian Hennig (2012). Clustering of categorical data: a comparison of different approaches. QUADERNI DI STATISTICA, 14, 1-4.
Laura Anderlucci; Christian Hennig
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/156004
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact