Physiological (development, aging) and pathological conditions (autoimmune maladies, cancers and other numerous diseases) are strongly influenced by DNA methylation, a stable epigenetic alteration occurring in the cells’ nucleus, during an individual’s life. In particular, ample part of the research in methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge [1]), as a potential early marker of diseases. In fact, large numbers of studies indicate that divergences between mAge and chronological age may be a powerful indicator of non- physiological conditions [2]. This research has been boosted by the evolution of high-throughput technologies enabling the quantification of DNA methylation levels across the human genome. Several mechanisms still need to be elucidated, yet the peculiar biochemistry of the phenomenon can be used as a base to enahnce current approaches of analysis. In particular, estimation of mAge can be impaired by multiple missing values. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression [3]. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of mAge estimation. Our linear regression model proves to perform equally or better in terms of accuracy with better computational efficiency. Further we highlight future directions and potential applications that may benefit from the preservation of datasets wholeness, better granted by data imputation.

METHYLATION CHALLENGES & OPPORTUNITIES FOR BIOMARKERS IDENTIFICATION – FOCUS ON IMPUTATION

P. Di Lena;Claudia Sala;Christine Nardini
2020

Abstract

Physiological (development, aging) and pathological conditions (autoimmune maladies, cancers and other numerous diseases) are strongly influenced by DNA methylation, a stable epigenetic alteration occurring in the cells’ nucleus, during an individual’s life. In particular, ample part of the research in methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge [1]), as a potential early marker of diseases. In fact, large numbers of studies indicate that divergences between mAge and chronological age may be a powerful indicator of non- physiological conditions [2]. This research has been boosted by the evolution of high-throughput technologies enabling the quantification of DNA methylation levels across the human genome. Several mechanisms still need to be elucidated, yet the peculiar biochemistry of the phenomenon can be used as a base to enahnce current approaches of analysis. In particular, estimation of mAge can be impaired by multiple missing values. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression [3]. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of mAge estimation. Our linear regression model proves to perform equally or better in terms of accuracy with better computational efficiency. Further we highlight future directions and potential applications that may benefit from the preservation of datasets wholeness, better granted by data imputation.
The Third International Conference on Mathematics and Statistics|AUS-ICMS ’20 - Book of Abstracts
73
73
P. Di Lena ; Claudia Sala; Andrea Prodi; Christine Nardini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/725800
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact