Protein sequence annotation is a major challenge in the post-genomic era. The number of uncharacterized gene products is rapidly growing and consequently well-established methods of gene and protein annotations, such as those based on homology-transfer, are annotating less data. Automated systems capable of exploiting information at different levels are therefore necessary. A first problem is to define what is a protein function since the terms varies based on the context in which they are used. To this aim, the schemes of the Gene Ontology vocabulary (http://www.geneontology.org/) may help in providing a standard way for programs to output their function predictions. Historically, automated function prediction has been performed by homology-based transfer relying on different tools for sequence similarity measures, including the most popular BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The finding that structures are routinely more conserved than sequences in the PDB data base (http://www.pdb.org/pdb/home/home.do) prompted the development of more elaborate systems where structural information, phylogenetic considerations, sequence patterns, structure alignment and structural patterns have been exploited to different extents. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. We recently described a new non-hierarchical clustering procedure characterized by a stringent metric, which ensures a reliable transfer of function between related proteins even in the case of multi-domain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the functional and/or structural annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we developped BAR (the Bologna Annotation Resource), a prediction server for protein functional/structural annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/) .

Predicting protein structure and function from sequence with BAR, the Bologna Annotation Resource.

CASADIO, RITA
2009

Abstract

Protein sequence annotation is a major challenge in the post-genomic era. The number of uncharacterized gene products is rapidly growing and consequently well-established methods of gene and protein annotations, such as those based on homology-transfer, are annotating less data. Automated systems capable of exploiting information at different levels are therefore necessary. A first problem is to define what is a protein function since the terms varies based on the context in which they are used. To this aim, the schemes of the Gene Ontology vocabulary (http://www.geneontology.org/) may help in providing a standard way for programs to output their function predictions. Historically, automated function prediction has been performed by homology-based transfer relying on different tools for sequence similarity measures, including the most popular BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The finding that structures are routinely more conserved than sequences in the PDB data base (http://www.pdb.org/pdb/home/home.do) prompted the development of more elaborate systems where structural information, phylogenetic considerations, sequence patterns, structure alignment and structural patterns have been exploited to different extents. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. We recently described a new non-hierarchical clustering procedure characterized by a stringent metric, which ensures a reliable transfer of function between related proteins even in the case of multi-domain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the functional and/or structural annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we developped BAR (the Bologna Annotation Resource), a prediction server for protein functional/structural annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/) .
2009
Proccedings of Protein Structure Prediction Workshop 2009
X
X
Casadio R.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/85721
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact