The rise of self-supervised learning (SSL) in medical imaging holds immense potential, particularly for leveraging unlabeled data and achieving surprising performance in scenarios with limited annotations. Our study shows that comparisons with traditional supervised learning (SL) are often confounded by differences in workflows, leading to potentially biased conclusions. The SL paradigm is typically employed with a one-stage training workflow, while the typical SSL linear evaluation workflow involves a two-stage process: pre-training a backbone with a projector, followed by fine-tuning a randomly initialized task-specific classification head replacing the projector. We show that the two-stage workflow, when applied to SL, can change the trained model performance. This is especially important when selecting the appropriate paradigm for medical imaging classification where the outcomes can have a clinical impact. We experimented with four medical imaging datasets, targeting age prediction and Alzheimer's disease diagnosis from brain MRI, pneumonia diagnosis from chest RX, and diagnosis of retina with choroidal neurovascularization from optical coherence tomography. For each dataset, we imposed different configurations of assumed label availability and class frequency distribution. For each configuration, we performed 30 experiments (5 for the larger dataset) and a robust statistical analysis of the results, which show that the different workflows can alter the trained model performance. This finding suggests that the typical comparisons between SL and SSL in literature may not solely reflect the learning paradigm itself but also the workflow, which is agnostic to the paradigm. In the field of medical imaging, where model performance directly impacts clinical decision-making and patient outcomes, ensuring fair and robust comparisons is critical. By addressing this overlooked bias, our work provides actionable insights to advance reliable methodologies, paving the way for more effective and trustworthy AI-driven solutions in healthcare.

Espis, A., Marzi, C., Diciotti, S. (2025). Self-supervised and supervised learning in medical imaging classification: addressing the hidden bias of workflow design [10.1109/EMBC58623.2025.11254853].

Self-supervised and supervised learning in medical imaging classification: addressing the hidden bias of workflow design

Espis A.
Primo
;
Diciotti S.
Ultimo
2025

Abstract

The rise of self-supervised learning (SSL) in medical imaging holds immense potential, particularly for leveraging unlabeled data and achieving surprising performance in scenarios with limited annotations. Our study shows that comparisons with traditional supervised learning (SL) are often confounded by differences in workflows, leading to potentially biased conclusions. The SL paradigm is typically employed with a one-stage training workflow, while the typical SSL linear evaluation workflow involves a two-stage process: pre-training a backbone with a projector, followed by fine-tuning a randomly initialized task-specific classification head replacing the projector. We show that the two-stage workflow, when applied to SL, can change the trained model performance. This is especially important when selecting the appropriate paradigm for medical imaging classification where the outcomes can have a clinical impact. We experimented with four medical imaging datasets, targeting age prediction and Alzheimer's disease diagnosis from brain MRI, pneumonia diagnosis from chest RX, and diagnosis of retina with choroidal neurovascularization from optical coherence tomography. For each dataset, we imposed different configurations of assumed label availability and class frequency distribution. For each configuration, we performed 30 experiments (5 for the larger dataset) and a robust statistical analysis of the results, which show that the different workflows can alter the trained model performance. This finding suggests that the typical comparisons between SL and SSL in literature may not solely reflect the learning paradigm itself but also the workflow, which is agnostic to the paradigm. In the field of medical imaging, where model performance directly impacts clinical decision-making and patient outcomes, ensuring fair and robust comparisons is critical. By addressing this overlooked bias, our work provides actionable insights to advance reliable methodologies, paving the way for more effective and trustworthy AI-driven solutions in healthcare.
2025
Annu Int Conf IEEE Eng Med Biol Soc 2025
1
6
Espis, A., Marzi, C., Diciotti, S. (2025). Self-supervised and supervised learning in medical imaging classification: addressing the hidden bias of workflow design [10.1109/EMBC58623.2025.11254853].
Espis, A.; Marzi, C.; Diciotti, S.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1050890
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact