CRIS Current Research Information System

In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clus- ters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.

de Amorim RC, Hennig C (2015). Recovering the number of clusters in data sets with noise features using feature rescaling factors. INFORMATION SCIENCES, 324, 126-145 [10.1016/j.ins.2015.06.039].

Recovering the number of clusters in data sets with noise features using feature rescaling factors

de Amorim RC;Hennig C

2015

Abstract

In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clus- ters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Rivista
	
				INFORMATION SCIENCES
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ins.2015.06.039
			
	Citazione
	
				de Amorim RC,  Hennig C (2015). Recovering the number of clusters in data sets with noise features using feature rescaling factors. INFORMATION SCIENCES, 324, 126-145 [10.1016/j.ins.2015.06.039].
			
	Tutti gli autori
	
						de Amorim RC; Hennig C
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/676354

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

295

245

social impact