CRIS Current Research Information System

Finite mixture modeling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the Raftery and Dean (2006) methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.

Scrucca, L., Raftery, A.E. (2018). Clustvarsel: A package implementing variable selection for Gaussian model-based clustering in R. JOURNAL OF STATISTICAL SOFTWARE, 84(1), 1-28 [10.18637/jss.v084.i01].

Clustvarsel: A package implementing variable selection for Gaussian model-based clustering in R

Scrucca L.;Raftery A. E.

2018

Abstract

Finite mixture modeling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the Raftery and Dean (2006) methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Rivista
	
				JOURNAL OF STATISTICAL SOFTWARE
			
	Codice DOI
	
				https://dx.doi.org/10.18637/jss.v084.i01
			
	Citazione
	
				Scrucca, L., Raftery, A.E. (2018). Clustvarsel: A package implementing variable selection for Gaussian model-based clustering in R. JOURNAL OF STATISTICAL SOFTWARE, 84(1), 1-28 [10.18637/jss.v084.i01].
			
	Tutti gli autori
	
						Scrucca, L.; Raftery, A. E.

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/997661

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

30

47

42

ND

social impact