CRIS Current Research Information System

In the framework of cluster analysis based on Gaussian mixture models, it is usually assumed that all the variables provide information about the clustering of the sample units. Several variable selection procedures are available in order to detect the structure of interest for the clustering when this structure is contained in a variable sub-vector. Currently, in these procedures a variable is assumed to play one of (up to) three roles: (1) informative, (2) uninformative and correlated with some informative variables, (3) uninformative and uncorrelated with any informative variable. A more general approach for modelling the role of a variable is proposed by taking into account the possibility that the variable vector provides information about more than one structure of interest for the clustering. This approach is developed by assuming that such information is given by non-overlapped and possibly correlated sub-vectors of variables; it is also assumed that the model for the variable vector is equal to a product of conditionally independent Gaussian mixture models (one for each variable sub-vector). Details about model identifiability, parameter estimation and model selection are provided. The usefulness and effectiveness of the described methodology are illustrated using simulated and real datasets.

Galimberti, G., Manisi, A., Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. STATISTICS AND COMPUTING, 28(1), 145-169 [10.1007/s11222-017-9723-0].

Modelling the role of variables in model-based cluster analysis

Galimberti, Giuliano;Manisi, Annamaria;Soffritti, Gabriele

2018

Abstract

In the framework of cluster analysis based on Gaussian mixture models, it is usually assumed that all the variables provide information about the clustering of the sample units. Several variable selection procedures are available in order to detect the structure of interest for the clustering when this structure is contained in a variable sub-vector. Currently, in these procedures a variable is assumed to play one of (up to) three roles: (1) informative, (2) uninformative and correlated with some informative variables, (3) uninformative and uncorrelated with any informative variable. A more general approach for modelling the role of a variable is proposed by taking into account the possibility that the variable vector provides information about more than one structure of interest for the clustering. This approach is developed by assuming that such information is given by non-overlapped and possibly correlated sub-vectors of variables; it is also assumed that the model for the variable vector is equal to a product of conditionally independent Gaussian mixture models (one for each variable sub-vector). Details about model identifiability, parameter estimation and model selection are provided. The usefulness and effectiveness of the described methodology are illustrated using simulated and real datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Rivista
	
				STATISTICS AND COMPUTING
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11222-017-9723-0
			
	Citazione
	
				Galimberti, G., Manisi, A., Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. STATISTICS AND COMPUTING, 28(1), 145-169 [10.1007/s11222-017-9723-0].
			
	Tutti gli autori
	
						Galimberti, Giuliano; Manisi, Annamaria; Soffritti, Gabriele
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
supplementary material.pdf accesso aperto Descrizione: Materiali supplementari Tipo: File Supplementare Licenza: Licenza per accesso libero gratuito Dimensione 209.75 kB Formato Adobe PDF Visualizza/Apri	209.75 kB	Adobe PDF	Visualizza/Apri
per IRIS.pdf Open Access dal 13/01/2018 Descrizione: Articolo Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per accesso libero gratuito Dimensione 578.78 kB Formato Adobe PDF Visualizza/Apri	578.78 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/585288

Citazioni

ND

18

16

social impact