Targeting peptides are the most important signal controlling the import of nuclear encoded proteins into mitochondria and plastids. In the lack of experimental information, their prediction is an essential step when proteomes are annotated, for inferring both the localization and the sequence of mature proteins. We developed TPpred a new predictor of organelle targeting peptides based on Grammatical Restrained Hidden Conditional Random Fields, a recently introduced machine-learning tool well suited to solve labeling problems (Fariselli et al., 2009) TPpred is trained on a non-redundant dataset of proteins where the presence of a target peptide was experimentally validated, comprising 297 sequences. When tested on the 297 positive and some other 8010 negative examples, TPpred outperforms available methods in both accuracy and Matthews correlation index (96% and 0.59, respectively). Given its very low false positive rate (3.0%), TPpred is therefore well suited for large-scale analyses at the proteome level. We predicted that about 4% to 9% of the sequences of human, Arabidopsis thaliana and yeast proteomes contain targeting peptides and are therefore likely to be localized in mitochondria and plastids. TPpred predictions correlate to a good extent the experimental annotation of the subcellular localization, when available. TPpred was also trained and tested to predict the cleavage site of the organelle targeting peptide on this task the average error of TPpred on mitochondrial and plastidic proteins is 7 and 15 residues, respectively. This value is lower than the error reported for other methods currently available.

The prediction of organelle targeting peptides in eukaryotic proteins with Grammatical Restrained Hidden Conditional Random Fields.

MARTELLI, PIER LUIGI;INDIO, VALENTINA;SAVOJARDO, CASTRENSE;FARISELLI, PIERO;CASADIO, RITA
2012

Abstract

Targeting peptides are the most important signal controlling the import of nuclear encoded proteins into mitochondria and plastids. In the lack of experimental information, their prediction is an essential step when proteomes are annotated, for inferring both the localization and the sequence of mature proteins. We developed TPpred a new predictor of organelle targeting peptides based on Grammatical Restrained Hidden Conditional Random Fields, a recently introduced machine-learning tool well suited to solve labeling problems (Fariselli et al., 2009) TPpred is trained on a non-redundant dataset of proteins where the presence of a target peptide was experimentally validated, comprising 297 sequences. When tested on the 297 positive and some other 8010 negative examples, TPpred outperforms available methods in both accuracy and Matthews correlation index (96% and 0.59, respectively). Given its very low false positive rate (3.0%), TPpred is therefore well suited for large-scale analyses at the proteome level. We predicted that about 4% to 9% of the sequences of human, Arabidopsis thaliana and yeast proteomes contain targeting peptides and are therefore likely to be localized in mitochondria and plastids. TPpred predictions correlate to a good extent the experimental annotation of the subcellular localization, when available. TPpred was also trained and tested to predict the cleavage site of the organelle targeting peptide on this task the average error of TPpred on mitochondrial and plastidic proteins is 7 and 15 residues, respectively. This value is lower than the error reported for other methods currently available.
2012
Proceedings of the 56th national meeting of the Italian Society of Biochemistry and Molecular Biology
210
210
Martelli Pier Luigi; Indio Valentina; Savojardo Castrense; Fariselli Piero; Casadio Rita
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/144861
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact