Understanding the underlying structure of medical data is essential for developing robust and reliable classification models. Supervised learning, which relies on predefined classes, may fail to capture the intrinsic patterns within the data, potentially leading to suboptimal outcomes. This study investigates the application of unsupervised clustering to analyze and validate the structure of a public medical dataset, the Breast Tissue Dataset, with varying class configurations (6 vs. 4 classes). Clustering methods, such as KMeans and Affinity Propagation models, were applied alongside classification models, including Random Forest and XGBoost. Key performance metrics, such as accuracy and confusion matrices, were employed to evaluate classification performance, while clustering results were assessed using the Adjusted Rand Index (ARI) and the Hopkins Test, which evaluates the clustering tendency of datasets. Additionally, the robustness of clustering to measurement uncertainty was examined by introducing synthetic noise (5 % and 10 % perturbations) into the input data, simulating real-world variability. The study further explores how clustering can reveal insights into class labels and assess the separability of different groups. Results demonstrate the utility of combining unsupervised clustering with supervised methods to enhance data exploration, assess the reliability of predefined labels, and improve classification in medical applications, even in the presence of measurement uncertainty.

Negri, V., Iadarola, G., Mingotti, A., Tinarelli, R., Peretto, L. (2025). Bridging Supervised and Unsupervised Learning for Classification of Breast Tissue. Institute of Electrical and Electronics Engineers Inc. [10.1109/memea65319.2025.11067980].

Bridging Supervised and Unsupervised Learning for Classification of Breast Tissue

Negri, Virginia;Mingotti, Alessandro;Tinarelli, Roberto;Peretto, Lorenzo
2025

Abstract

Understanding the underlying structure of medical data is essential for developing robust and reliable classification models. Supervised learning, which relies on predefined classes, may fail to capture the intrinsic patterns within the data, potentially leading to suboptimal outcomes. This study investigates the application of unsupervised clustering to analyze and validate the structure of a public medical dataset, the Breast Tissue Dataset, with varying class configurations (6 vs. 4 classes). Clustering methods, such as KMeans and Affinity Propagation models, were applied alongside classification models, including Random Forest and XGBoost. Key performance metrics, such as accuracy and confusion matrices, were employed to evaluate classification performance, while clustering results were assessed using the Adjusted Rand Index (ARI) and the Hopkins Test, which evaluates the clustering tendency of datasets. Additionally, the robustness of clustering to measurement uncertainty was examined by introducing synthetic noise (5 % and 10 % perturbations) into the input data, simulating real-world variability. The study further explores how clustering can reveal insights into class labels and assess the separability of different groups. Results demonstrate the utility of combining unsupervised clustering with supervised methods to enhance data exploration, assess the reliability of predefined labels, and improve classification in medical applications, even in the presence of measurement uncertainty.
2025
MEMEA
1
6
Negri, V., Iadarola, G., Mingotti, A., Tinarelli, R., Peretto, L. (2025). Bridging Supervised and Unsupervised Learning for Classification of Breast Tissue. Institute of Electrical and Electronics Engineers Inc. [10.1109/memea65319.2025.11067980].
Negri, Virginia; Iadarola, Grazia; Mingotti, Alessandro; Tinarelli, Roberto; Peretto, Lorenzo
File in questo prodotto:
File Dimensione Formato  
MeMea_final.pdf

embargo fino al 10/07/2027

Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza: Licenza per accesso libero gratuito
Dimensione 1.11 MB
Formato Adobe PDF
1.11 MB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1021830
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact