The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data. Building on the transformation-based approach to Gaussian mixture density estimation introduced by Scrucca (Biometrical Journal,61(4), 873–888, 2019), we extend this framework to develop a probabilistic clustering algorithm for data with bounded support that allows for accurate clustering while respecting the natural bounds of the variables. In our proposal, a flexible range-power transformation is employed to map the data from its bounded domain to the unrestricted real space, hence enabling the estimation of Gaussian mixture models in the transformed space. Despite the close connection to density estimation, the behavior of this approach has not been previously investigated in the literature. Furthermore, we introduce a novel measure of clustering uncertainty, the normalized classification entropy (NCE), which provides a general and interpretable measure of classification uncertainty. The performance of the proposed method is evaluated through real-world data applications involving both fully and partially bounded data, in both univariate and multivariate settings, showing improved cluster recovery and interpretability. Overall, the empirical results demonstrate the effectiveness and advantages of our approach over traditional and advanced model-based clustering techniques that rely on distributions with bounded support.
Scrucca, L. (2025). A Model-Based Clustering Approach for Bounded Data Using Transformation-Based Gaussian Mixture Models. JOURNAL OF CLASSIFICATION, 0, 1-19 [10.1007/s00357-025-09511-8].
A Model-Based Clustering Approach for Bounded Data Using Transformation-Based Gaussian Mixture Models
Scrucca L.
2025
Abstract
The clustering of bounded data presents unique challenges in statistical analysis due to the constraints imposed on the data values. This paper introduces a novel method for model-based clustering specifically designed for bounded data. Building on the transformation-based approach to Gaussian mixture density estimation introduced by Scrucca (Biometrical Journal,61(4), 873–888, 2019), we extend this framework to develop a probabilistic clustering algorithm for data with bounded support that allows for accurate clustering while respecting the natural bounds of the variables. In our proposal, a flexible range-power transformation is employed to map the data from its bounded domain to the unrestricted real space, hence enabling the estimation of Gaussian mixture models in the transformed space. Despite the close connection to density estimation, the behavior of this approach has not been previously investigated in the literature. Furthermore, we introduce a novel measure of clustering uncertainty, the normalized classification entropy (NCE), which provides a general and interpretable measure of classification uncertainty. The performance of the proposed method is evaluated through real-world data applications involving both fully and partially bounded data, in both univariate and multivariate settings, showing improved cluster recovery and interpretability. Overall, the empirical results demonstrate the effectiveness and advantages of our approach over traditional and advanced model-based clustering techniques that rely on distributions with bounded support.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


