methyLImp2

Plaksienko, Anna; Di Lena, Pietro; Nardini, Christine; Angelini, Claudia

Motivation: methyLImp, a method we recently introduced for the missing value estimation of DNA methylation data, has demonstrated compet- itive performance in data imputation compared to the existing, general-purpose, approaches. However, imputation running time was consider- ably long and unfeasible in case of large datasets with numerous missing values. Results: methyLImp2 made possible computations that were previously unfeasible. We achieved this by introducing two important modifica- tions that have significantly reduced the original running time without sacrificing prediction performance. First, we implemented a chromosome- wise parallel version of methyLImp. This parallelization reduced the runtime by several 10-fold in our experiments. Then, to handle large datasets, we also introduced a mini-batch approach that uses only a subset of the samples for the imputation. Thus, it further reduces the running time from days to hours or even minutes in large datasets. Availability and implementation: The R package methyLImp2 is under review for Bioconductor. It is currently freely available on Github https://github.com/annaplaksienko/methyLImp2.

Anna Plaksienko, P.D.L. (2024). methyLImp2.

methyLImp2

Pietro Di Lena^Secondo;Christine Nardini^Penultimo;Claudia Angelini^Ultimo

2024

Abstract

Motivation: methyLImp, a method we recently introduced for the missing value estimation of DNA methylation data, has demonstrated compet- itive performance in data imputation compared to the existing, general-purpose, approaches. However, imputation running time was consider- ably long and unfeasible in case of large datasets with numerous missing values. Results: methyLImp2 made possible computations that were previously unfeasible. We achieved this by introducing two important modifica- tions that have significantly reduced the original running time without sacrificing prediction performance. First, we implemented a chromosome- wise parallel version of methyLImp. This parallelization reduced the runtime by several 10-fold in our experiments. Then, to handle large datasets, we also introduced a mini-batch approach that uses only a subset of the samples for the imputation. Thus, it further reduces the running time from days to hours or even minutes in large datasets. Availability and implementation: The R package methyLImp2 is under review for Bioconductor. It is currently freely available on Github https://github.com/annaplaksienko/methyLImp2.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Citazione
	
				Anna Plaksienko, P.D.L. (2024). methyLImp2.
			
	Tutti gli autori
	
						Anna Plaksienko, Pietro Di Lena, Christine Nardini, Claudia Angelini
					
	Appare nelle tipologie:
	
				7.04 Software

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/962589

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

CRIS Current Research Information System

methyLImp2

Pietro Di Lena^Secondo;Christine Nardini^Penultimo;Claudia Angelini^Ultimo

Primo

Secondo

Penultimo

Ultimo

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Attenzione

Citazioni

social impact

CRIS Current Research Information System

methyLImp2

Anna Plaksienko Primo;Pietro Di LenaSecondo;Christine NardiniPenultimo;Claudia AngeliniUltimo

Primo

Secondo

Penultimo

Ultimo

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Attenzione

Citazioni

social impact

Conferma cancellazione

Pietro Di Lena^Secondo;Christine Nardini^Penultimo;Claudia Angelini^Ultimo

Scheda breve

Scheda completa

Scheda completa (DC)