MOTIVATION: Runs of homozygosity (ROH) are sizable chromosomal stretches of homozygous genotypes, ranging in length from tens of kilobases to megabases. ROHs can be relevant for population and medical genetics, playing a role in predisposition to both rare and common disorders. ROHs are commonly detected by single nucleotide polymorphism (SNP) microarrays, but attempts have been made to use whole-exome sequencing (WES) data. Currently available methods developed for the analysis of uniformly spaced SNP-array maps do not fit easily to the analysis of the sparse and non-uniform distribution of the WES target design. RESULTS: To meet the need of an approach specifically tailored to WES data, we developed [Formula: see text], an original algorithm based on heterogeneous hidden Markov model that incorporates inter-marker distances to detect ROH from WES data. We evaluated the performance of [Formula: see text] to correctly identify ROHs on synthetic chromosomes and examined its accuracy in detecting ROHs of different length (short, medium and long) from real 1000 genomes project data. [Formula: see text] turned out to be more accurate than GERMLINE and PLINK, two state-of-the-art algorithms, especially in the detection of short and medium ROHs. AVAILABILITY AND IMPLEMENTATION: [Formula: see text] is a collection of bash, R and Fortran scripts and codes and is freely available at https://sourceforge.net/projects/h3m2/.
A. Magi, L. Tattini, F. Palombo, M. Benelli, A. Gialluisi, B. Giusti, et al. (2014). H3M2: detection of runs of homozygosity from whole-exome sequencing data. BIOINFORMATICS, 30, 2852-2859 [10.1093/bioinformatics/btu401].
H3M2: detection of runs of homozygosity from whole-exome sequencing data
SERI, MARCO;ROMEO, GIOVANNI;PIPPUCCI, TOMMASO
2014
Abstract
MOTIVATION: Runs of homozygosity (ROH) are sizable chromosomal stretches of homozygous genotypes, ranging in length from tens of kilobases to megabases. ROHs can be relevant for population and medical genetics, playing a role in predisposition to both rare and common disorders. ROHs are commonly detected by single nucleotide polymorphism (SNP) microarrays, but attempts have been made to use whole-exome sequencing (WES) data. Currently available methods developed for the analysis of uniformly spaced SNP-array maps do not fit easily to the analysis of the sparse and non-uniform distribution of the WES target design. RESULTS: To meet the need of an approach specifically tailored to WES data, we developed [Formula: see text], an original algorithm based on heterogeneous hidden Markov model that incorporates inter-marker distances to detect ROH from WES data. We evaluated the performance of [Formula: see text] to correctly identify ROHs on synthetic chromosomes and examined its accuracy in detecting ROHs of different length (short, medium and long) from real 1000 genomes project data. [Formula: see text] turned out to be more accurate than GERMLINE and PLINK, two state-of-the-art algorithms, especially in the detection of short and medium ROHs. AVAILABILITY AND IMPLEMENTATION: [Formula: see text] is a collection of bash, R and Fortran scripts and codes and is freely available at https://sourceforge.net/projects/h3m2/.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.