It is argued that the determination of the best number of clusters k is crucially dependent on the aim of clustering. Existing supposedly “objective” methods of estimating k ignore this. k can be determined by listing a number of requirements for a good clustering in the given application and finding a k that fulfils them all. The approach is illustrated by application to the problem of finding the number of species in a data set of Australasian tetragonula bees. Requirements here include two new statistics formalising the largest within-cluster gap and cluster separation. Due to the typical nature of expert knowledge, it is difficult to make requirements precise, and a number of subjective decisions is involved.
Hennig C (2014). How many bee species? a case study in determining the number of clusters. Berlin : Springer [10.1007/978-3-319-01595-8_5].
How many bee species? a case study in determining the number of clusters
Hennig C
2014
Abstract
It is argued that the determination of the best number of clusters k is crucially dependent on the aim of clustering. Existing supposedly “objective” methods of estimating k ignore this. k can be determined by listing a number of requirements for a good clustering in the given application and finding a k that fulfils them all. The approach is illustrated by application to the problem of finding the number of species in a data set of Australasian tetragonula bees. Requirements here include two new statistics formalising the largest within-cluster gap and cluster separation. Due to the typical nature of expert knowledge, it is difficult to make requirements precise, and a number of subjective decisions is involved.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.