CRIS Current Research Information System

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Akhanli S.E., Hennig C. (2020). Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes. STATISTICS AND COMPUTING, 30(5 (September)), 1523-1544 [10.1007/s11222-020-09958-2].

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Hennig C.^{Membro del Collaboration Group}

2020

Abstract

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Rivista
	
				STATISTICS AND COMPUTING
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s11222-020-09958-2
			
	Citazione
	
				Akhanli S.E.,  Hennig C. (2020). Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes. STATISTICS AND COMPUTING, 30(5 (September)), 1523-1544 [10.1007/s11222-020-09958-2].
			
	Tutti gli autori
	
						Akhanli S.E.; Hennig C.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
akhanli-hennig-arxiv_revision3.pdf Open Access dal 26/06/2021 Descrizione: arxiv of final revision before publication (i.e. before tiny editorial changes) Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per accesso libero gratuito Dimensione 1.46 MB Formato Adobe PDF Visualizza/Apri	1.46 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/783629

Citazioni

ND

56

50

ND

social impact