In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to address this, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index (which is expected to favour model-based clustering) and Average Silhouette Width index (which is expected to favour distance-based clustering) in a fairly wide simulation study. Visualization of the clustering outcomes is obtained with a special use of Multidimensional Scaling.
Laura Anderlucci, Christian Hennig (2012). Clustering of categorical data: a comparison of different approaches. QUADERNI DI STATISTICA, 14, 1-4.
Clustering of categorical data: a comparison of different approaches
ANDERLUCCI, LAURA;HENNIG, CHRISTIAN MARTIN
2012
Abstract
In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to address this, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index (which is expected to favour model-based clustering) and Average Silhouette Width index (which is expected to favour distance-based clustering) in a fairly wide simulation study. Visualization of the clustering outcomes is obtained with a special use of Multidimensional Scaling.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.