Simulation studies are often used to compare different clustering methods, be it with the aim of promoting a new method, or for investigating the quality of existing methods from a neutral point of view. I will go through a number of aspects of designing and running such studies, including the definition and measurement of clustering quality, the choice of models to generate data from, aggregation and visualisation of results, and also limits of what we can learn from such studies. The paper may be useful for researchers who run such simulation studies and for those interested in the results of them. Some aspects are relevant for more general simulation studies, also outside the domain of cluster analysis.
Christian Martin Hennig (2018). Some Thoughts on Simulation Studies to Compare Clustering Methods. ARCHIVES OF DATA SCIENCE, SERIES A, 5, 1-21 [10.5445/KSP/1000087327/24].
Some Thoughts on Simulation Studies to Compare Clustering Methods
Christian Martin Hennig
Primo
2018
Abstract
Simulation studies are often used to compare different clustering methods, be it with the aim of promoting a new method, or for investigating the quality of existing methods from a neutral point of view. I will go through a number of aspects of designing and running such studies, including the definition and measurement of clustering quality, the choice of models to generate data from, aggregation and visualisation of results, and also limits of what we can learn from such studies. The paper may be useful for researchers who run such simulation studies and for those interested in the results of them. Some aspects are relevant for more general simulation studies, also outside the domain of cluster analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.