Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data

Testa, Irene; Prencipe, Giuseppe; Priami, Corrado; Sirbu, Alina

doi:10.1109/BigData59044.2023.10386445

Omics data are being generated for different conditions, and can be a valuable resource for building novel predictive models for medical diagnosis. Given the reduced number of samples in each dataset, the application of Machine Learning (ML) models requires data integration. At the same time, multiple ML models are available, and the best option for data integration is not known. These challenges have been addressed typically in restricted settings, i.e., for one single disease at a time. However, a thorough comparison of models on integrated data, for different conditions, is still missing. In this paper we confront 7 classifiers on integrated data for 6 diseases, over 14 datasets. We compared the models on single and integrated datasets, employing different pre-processing techniques. We also evaluated the effect of feature selection, analyzing the robustness and relevance of the features extracted. We observed that, even if integration slightly reduces predictive power, the models are still able to produce good classifications. When testing generalization abilities on new datasets, sometimes the performance decreases drastically, depending on the disease studied.

Testa, I., Prencipe, G., Priami, C., Sirbu, A. (2023). Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data [10.1109/BigData59044.2023.10386445].

Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data

Irene Testa;Giuseppe Prencipe;Corrado Priami;Alina Sirbu

2023

Abstract

Omics data are being generated for different conditions, and can be a valuable resource for building novel predictive models for medical diagnosis. Given the reduced number of samples in each dataset, the application of Machine Learning (ML) models requires data integration. At the same time, multiple ML models are available, and the best option for data integration is not known. These challenges have been addressed typically in restricted settings, i.e., for one single disease at a time. However, a thorough comparison of models on integrated data, for different conditions, is still missing. In this paper we confront 7 classifiers on integrated data for 6 diseases, over 14 datasets. We compared the models on single and integrated datasets, employing different pre-processing techniques. We also evaluated the effect of feature selection, analyzing the robustness and relevance of the features extracted. We observed that, even if integration slightly reduces predictive power, the models are still able to produce good classifications. When testing generalization abilities on new datasets, sometimes the performance decreases drastically, depending on the disease studied.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				2023 IEEE International Conference on Big Data (BigData)
			
	Pagina iniziale
	
				4987
			
	Pagina finale
	
				4996
			
	Codice DOI
	
				https://dx.doi.org/10.1109/BigData59044.2023.10386445
			
	Citazione
	
				Testa, I., Prencipe, G., Priami, C., Sirbu, A. (2023). Comparison of Machine Learning Classifiers on Integrated Transcriptomic Data [10.1109/BigData59044.2023.10386445].
			
	Tutti gli autori
	
						Testa, Irene; Prencipe, Giuseppe; Priami, Corrado; Sirbu, Alina

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1008589

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

CRIS Current Research Information System