A new cluster analysis method, K-quantiles clustering, is introduced. K-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for K-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although K-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of K-quantiles clustering is proved, and it is shown that K-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, K-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by K-quantiles.

Hennig, C., Viroli, C., Anderlucci, L. (2019). Quantile-based clustering. ELECTRONIC JOURNAL OF STATISTICS, 13(2), 4849-4883 [10.1214/19-EJS1640].

Quantile-based clustering

Hennig, Christian
;
Viroli, Cinzia;Anderlucci, Laura
2019

Abstract

A new cluster analysis method, K-quantiles clustering, is introduced. K-quantiles clustering can be computed by a simple greedy algorithm in the style of the classical Lloyd’s algorithm for K-means. It can be applied to large and high-dimensional datasets. It allows for within-cluster skewness and internal variable scaling based on within-cluster variation. Different versions allow for different levels of parsimony and computational efficiency. Although K-quantiles clustering is conceived as nonparametric, it can be connected to a fixed partition model of generalized asymmetric Laplace-distributions. The consistency of K-quantiles clustering is proved, and it is shown that K-quantiles clusters correspond to well separated mixture components in a nonparametric mixture. In a simulation, K-quantiles clustering is compared with a number of popular clustering methods with good results. A high-dimensional microarray dataset is clustered by K-quantiles.
2019
Hennig, C., Viroli, C., Anderlucci, L. (2019). Quantile-based clustering. ELECTRONIC JOURNAL OF STATISTICS, 13(2), 4849-4883 [10.1214/19-EJS1640].
Hennig, Christian; Viroli, Cinzia; Anderlucci, Laura
File in questo prodotto:
File Dimensione Formato  
19-EJS1640.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 577.16 kB
Formato Adobe PDF
577.16 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/707699
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact