A new cluster analysis method, K-quantiles clustering, is introduced. K-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for K-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although K-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of K-quantiles clustering is proved, and it is shown that K-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, K-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by K-quantiles.
Quantile-based clustering / Hennig, Christian; Viroli, Cinzia; Anderlucci, Laura. - In: ELECTRONIC JOURNAL OF STATISTICS. - ISSN 1935-7524. - ELETTRONICO. - 13:2(2019), pp. 4849-4883. [10.1214/19-EJS1640]
Quantile-based clustering
Hennig, Christian
;Viroli, Cinzia;Anderlucci, Laura
2019
Abstract
A new cluster analysis method, K-quantiles clustering, is introduced. K-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for K-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although K-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of K-quantiles clustering is proved, and it is shown that K-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, K-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by K-quantiles.File | Dimensione | Formato | |
---|---|---|---|
19-EJS1640.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
577.16 kB
Formato
Adobe PDF
|
577.16 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.