Background: Statistical approaches to genetic sequences have revealed helpful to gain deeper insight into biological and structural functionalities, using ideas coming from information theory and stochastic modelling of symbolic sequences. In particular, previous analyses on CG dinucleotide position along the genome allowed to highlight its epigenetic role in DNA methylation, showing a different distribution tail as compared to other dinucleotides. In this paper we extend the analysis to the whole CG distance distribution over a selected set of higher-order organisms. Then we apply the best fitting probability density function to a large range of organisms (>4400) of different complexity (from bacteria to mammals) and we characterize some emerging global features. Results: We find that the Gamma distribution is optimal for the selected subset as compared to a group of several distributions, chosen for their physical meaning or because recently used in literature for similar studies. The parameters of this distribution, when applied to our larger set of organisms, allows to highlight some biologically relavant features for the considered organism classes, that can be useful also for classification purposes. Conclusions: The quantification of statistical properties of CG dinucleotide positioning along the genome is confirmed as a useful tool to characterize broad classes of organisms, spanning the whole range of biological complexity.

Merlotti, A., Faria do Valle, I., Castellani, G., Remondini, D. (2018). Statistical modelling of CG interdistance across multiple organisms. BMC BIOINFORMATICS, 19(Suppl 10), 355-362 [10.1186/s12859-018-2303-2].

Statistical modelling of CG interdistance across multiple organisms

Merlotti, A.;Faria do Valle, I.;Castellani, G.;Remondini, D.
2018

Abstract

Background: Statistical approaches to genetic sequences have revealed helpful to gain deeper insight into biological and structural functionalities, using ideas coming from information theory and stochastic modelling of symbolic sequences. In particular, previous analyses on CG dinucleotide position along the genome allowed to highlight its epigenetic role in DNA methylation, showing a different distribution tail as compared to other dinucleotides. In this paper we extend the analysis to the whole CG distance distribution over a selected set of higher-order organisms. Then we apply the best fitting probability density function to a large range of organisms (>4400) of different complexity (from bacteria to mammals) and we characterize some emerging global features. Results: We find that the Gamma distribution is optimal for the selected subset as compared to a group of several distributions, chosen for their physical meaning or because recently used in literature for similar studies. The parameters of this distribution, when applied to our larger set of organisms, allows to highlight some biologically relavant features for the considered organism classes, that can be useful also for classification purposes. Conclusions: The quantification of statistical properties of CG dinucleotide positioning along the genome is confirmed as a useful tool to characterize broad classes of organisms, spanning the whole range of biological complexity.
2018
Merlotti, A., Faria do Valle, I., Castellani, G., Remondini, D. (2018). Statistical modelling of CG interdistance across multiple organisms. BMC BIOINFORMATICS, 19(Suppl 10), 355-362 [10.1186/s12859-018-2303-2].
Merlotti, A.; Faria do Valle, I.; Castellani, G.*; Remondini, D.
File in questo prodotto:
File Dimensione Formato  
CG interdistance multiple organisms_Merlotti Remondini_BMCBioinfo18.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 896.23 kB
Formato Adobe PDF
896.23 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/678106
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact