Effect of data leakage in brain MRI classification using 2D convolutional neural networks

Yagis, E.; Atnafu, S. W.; Garcia Seco De Herrera, A.; Marzi, C.; Scheda, R.; Giannelli, M.; Tessa, C.; Citi, L.; Diciotti, S.

doi:10.1038/s41598-021-01681-w

In recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, developing CNN models with good generalization abilities is still a challenging task due to possible data leakage introduced during cross-validation (CV). In this study, we quantitatively assessed the effect of a data leakage caused by 3D MRI data splitting based on a 2D slice-level using three 2D CNN models to classify patients with Alzheimer’s disease (AD) and Parkinson’s disease (PD). Our experiments showed that slice-level CV erroneously boosted the average slice level accuracy on the test set by 30% on Open Access Series of Imaging Studies (OASIS), 29% on Alzheimer’s Disease Neuroimaging Initiative (ADNI), 48% on Parkinson’s Progression Markers Initiative (PPMI) and 55% on a local de-novo PD Versilia dataset. Further tests on a randomly labeled OASIS-derived dataset produced about 96% of (erroneous) accuracy (slice-level split) and 50% accuracy (subject-level split), as expected from a randomized experiment. Overall, the extent of the effect of an erroneous slice-based CV is severe, especially for small datasets.

Yagis E., Atnafu S.W., Garcia Seco de Herrera A., Marzi C., Scheda R., Giannelli M., et al. (2021). Effect of data leakage in brain MRI classification using 2D convolutional neural networks. SCIENTIFIC REPORTS, 11(1), 1-13 [10.1038/s41598-021-01681-w].

Effect of data leakage in brain MRI classification using 2D convolutional neural networks

Yagis E.;Atnafu S. W.;Garcia Seco de Herrera A.;Marzi C.;Scheda R.;Giannelli M.;Tessa C.;Citi L.;Diciotti S.

2021

Abstract

In recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, developing CNN models with good generalization abilities is still a challenging task due to possible data leakage introduced during cross-validation (CV). In this study, we quantitatively assessed the effect of a data leakage caused by 3D MRI data splitting based on a 2D slice-level using three 2D CNN models to classify patients with Alzheimer’s disease (AD) and Parkinson’s disease (PD). Our experiments showed that slice-level CV erroneously boosted the average slice level accuracy on the test set by 30% on Open Access Series of Imaging Studies (OASIS), 29% on Alzheimer’s Disease Neuroimaging Initiative (ADNI), 48% on Parkinson’s Progression Markers Initiative (PPMI) and 55% on a local de-novo PD Versilia dataset. Further tests on a randomly labeled OASIS-derived dataset produced about 96% of (erroneous) accuracy (slice-level split) and 50% accuracy (subject-level split), as expected from a randomized experiment. Overall, the extent of the effect of an erroneous slice-based CV is severe, especially for small datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista
	
				SCIENTIFIC REPORTS
			
	Codice DOI
	
				https://dx.doi.org/10.1038/s41598-021-01681-w
			
	Citazione
	
				Yagis E.,  Atnafu S.W.,  Garcia Seco de Herrera A.,  Marzi C.,  Scheda R.,  Giannelli M., et al. (2021). Effect of data leakage in brain MRI classification using 2D convolutional neural networks. SCIENTIFIC REPORTS, 11(1), 1-13 [10.1038/s41598-021-01681-w].
			
	Tutti gli autori
	
						Yagis E.; Atnafu S.W.; Garcia Seco de Herrera A.; Marzi C.; Scheda R.; Giannelli M.; Tessa C.; Citi L.; Diciotti S.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
yagis21.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Creative commons Dimensione 2.16 MB Formato Adobe PDF Visualizza/Apri	2.16 MB	Adobe PDF	Visualizza/Apri
41598_2021_1681_MOESM1_ESM.pdf accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 870.61 kB Formato Adobe PDF Visualizza/Apri	870.61 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/872110

Citazioni

14

79

64

CRIS Current Research Information System

Effect of data leakage in brain MRI classification using 2D convolutional neural networks

Yagis E.;Atnafu S. W.;Garcia Seco de Herrera A.;Marzi C.;Scheda R.;Giannelli M.;Tessa C.;Citi L.;Diciotti S.

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CRIS Current Research Information System

Effect of data leakage in brain MRI classification using 2D convolutional neural networks

Yagis E.;Atnafu S. W.;Garcia Seco de Herrera A.;Marzi C.;Scheda R.;Giannelli M.;Tessa C.;Citi L.;Diciotti S.

2021

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)