CRIS Current Research Information System

Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.

Improving stability of prediction models based on correlated omics data by using network approaches / Tissier R.; Houwing-Duistermaat J.; Rodriguez-Girondo M.. - In: PLOS ONE. - ISSN 1932-6203. - ELETTRONICO. - 13:2(2018), pp. e0192853.1-e0192853.23. [10.1371/journal.pone.0192853]

Improving stability of prediction models based on correlated omics data by using network approaches

Tissier R.;Houwing-Duistermaat J.;Rodriguez-Girondo M.

2018

Abstract

Building prediction models based on complex omics datasets such as transcriptomics, proteomics, metabolomics remains a challenge in bioinformatics and biostatistics. Regularized regression techniques are typically used to deal with the high dimensionality of these datasets. However, due to the presence of correlation in the datasets, it is difficult to select the best model and application of these methods yields unstable results. We propose a novel strategy for model selection where the obtained models also perform well in terms of overall predictability. Several three step approaches are considered, where the steps are 1) network construction, 2) clustering to empirically derive modules or pathways, and 3) building a prediction model incorporating the information on the modules. For the first step, we use weighted correlation networks and Gaussian graphical modelling. Identification of groups of features is performed by hierarchical clustering. The grouping information is included in the prediction model by using group-based variable selection or group-specific penalization. We compare the performance of our new approaches with standard regularized regression via simulations. Based on these results we provide recommendations for selecting a strategy for building a prediction model given the specific goal of the analysis and the sizes of the datasets. Finally we illustrate the advantages of our approach by application of the methodology to two problems, namely prediction of body mass index in the DIetary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome study (DILGOM) and prediction of response of each breast cancer cell line to treatment with specific drugs using a breast cancer cell lines pharmacogenomics dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2018
		
	Rivista
	
			PLOS ONE
		
	Codice DOI
	
			https://dx.doi.org/10.1371/journal.pone.0192853
		
	Citazione
	
			Improving stability of prediction models based on correlated omics data by using network approaches / Tissier R.; Houwing-Duistermaat J.; Rodriguez-Girondo M.. - In: PLOS ONE. - ISSN 1932-6203. - ELETTRONICO. - 13:2(2018), pp. e0192853.1-e0192853.23. [10.1371/journal.pone.0192853]
		
	Tutti gli autori
	
			Tissier R.; Houwing-Duistermaat J.; Rodriguez-Girondo M.
		
	Appare nelle tipologie:
	
			1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
file (1).pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 3.35 MB Formato Adobe PDF Visualizza/Apri	3.35 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/879784

Citazioni

ND

4

4

social impact