The prediction of the effect of Single Nucleotide Polymorphisms (SNPs) is one of the most ambitious challenges in computational biology. SNPs account for about 90% of genetic variations in human population. Recent investigations are focused on non-synonymous coding SNPs that are responsible of protein single point mutation, since mutations occurring in coding regions may affect gene functionality. Gene Ontology (GO) provides a curated vocabulary to describe gene's functionality. We propose a GO log-odd based score to discriminate functionally relevant genes. In this work we present a machine learning-based method to predict the effect of a given mutation on human health. In particular the relationship between SNPs and the insurgence of cancer has been studied using a support vector machine (SVM). Hence, we developed a SVM-based predictor (PhD-SNP-C) that put together in a unique input vector various features derived from protein sequence, profile, and a GO-based score. The predictor here proposed reaches the overall accuracy of 75% and a correlation coefficient of 0.50 on a set of 1087 cancer-related mutations and 1100 random selected mutation annotated as neutral in the Swiss-Prot dataset. On the same set of proteins a similar SVM-based method that does not take in to account the GO log-odd score scores 66% of accuracy with a correlation coefficient of 0.32. Our results indicate that the inclusion of the information derived from the GO annotations improves the prediction of cancer-related mutation. Overall, the prediction values computed by the PhD-SNP-C method are 9% more accurate than those obtained on our previous SVM-based method with a gain of the correlation coefficient value of 0.18. Furthermore, if the results are filtered according to their reliability index (RI) at RI 3 (comprising 71% of the dataset.), PhD-SNP-C score as high as 82% of accuracy and with a 0.64 value of the correlation coefficient.
Calabrese R., Capriotti E., Martelli P.L., Fariselli P., Casadio R. (2008). Gene Ontology annotation improves the prediction of cancer-related mutations. INNSBRUCK : s.n.
Gene Ontology annotation improves the prediction of cancer-related mutations
CALABRESE, REMO;CAPRIOTTI, EMIDIO;MARTELLI, PIER LUIGI;FARISELLI, PIERO;CASADIO, RITA
2008
Abstract
The prediction of the effect of Single Nucleotide Polymorphisms (SNPs) is one of the most ambitious challenges in computational biology. SNPs account for about 90% of genetic variations in human population. Recent investigations are focused on non-synonymous coding SNPs that are responsible of protein single point mutation, since mutations occurring in coding regions may affect gene functionality. Gene Ontology (GO) provides a curated vocabulary to describe gene's functionality. We propose a GO log-odd based score to discriminate functionally relevant genes. In this work we present a machine learning-based method to predict the effect of a given mutation on human health. In particular the relationship between SNPs and the insurgence of cancer has been studied using a support vector machine (SVM). Hence, we developed a SVM-based predictor (PhD-SNP-C) that put together in a unique input vector various features derived from protein sequence, profile, and a GO-based score. The predictor here proposed reaches the overall accuracy of 75% and a correlation coefficient of 0.50 on a set of 1087 cancer-related mutations and 1100 random selected mutation annotated as neutral in the Swiss-Prot dataset. On the same set of proteins a similar SVM-based method that does not take in to account the GO log-odd score scores 66% of accuracy with a correlation coefficient of 0.32. Our results indicate that the inclusion of the information derived from the GO annotations improves the prediction of cancer-related mutation. Overall, the prediction values computed by the PhD-SNP-C method are 9% more accurate than those obtained on our previous SVM-based method with a gain of the correlation coefficient value of 0.18. Furthermore, if the results are filtered according to their reliability index (RI) at RI 3 (comprising 71% of the dataset.), PhD-SNP-C score as high as 82% of accuracy and with a 0.64 value of the correlation coefficient.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.