There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered. A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures.

G. Galimberti, G. Soffritti (2007). Model-based methods to identify multiple cluster structures in a data set. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 52, 520-536.

Model-based methods to identify multiple cluster structures in a data set

GALIMBERTI, GIULIANO;SOFFRITTI, GABRIELE
2007

Abstract

There is an interest in the problem of identifying different partitions of a given set of units obtained according to different subsets of the observed variables (multiple cluster structures). A model-based procedure has been previously developed for detecting multiple cluster structures from independent subsets of variables. The method relies on model-based clustering methods and on a comparison among mixture models using the Bayesian Information Criterion. A generalization of this method which allows the use of any model-selection criterion is considered. A new approach combining the generalized model-based procedure with variable-clustering methods is proposed. The usefulness of the new method is shown using simulated and real examples. Monte Carlo methods are employed to evaluate the performance of various approaches. Data matrices with two cluster structures are analyzed taking into account the separation of clusters, the heterogeneity within clusters and the dependence of cluster structures.
2007
G. Galimberti, G. Soffritti (2007). Model-based methods to identify multiple cluster structures in a data set. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 52, 520-536.
G. Galimberti; G. Soffritti
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/57376
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 27
social impact