As a result of large sequencing projects, databases of protein sequences and structures are growing rapidly. The number of protein sequences, however, is orders of magnitude larger than the number of structures known at the atomic level. This is true despite efforts to accelerate procedures to resolve protein structures. Tools have been developed to bridge the gap between protein sequence and 3D structure, based on retrieval of information from databases and the use of knowledge-based methods to provide solutions to the protein folding problem. Several machine learning approaches are available to address various sub-problems of protein structure prediction. These include secondary structure, recognition of domains, motifs, ligand binding sites, and the topology of membrane proteins. In protein design, all these features can help to compute a putative protein model. This chapter reviews the state of the art of protein structure prediction, and describes a recent non-hierarchical clustering procedure implemented to fully exploit current knowledge in the databases on sequences, structures and functions. This procedure increases the number of sequences that can be annotated (inferring function of sequence from database information) by transfer of annotation from a set of 599 genomes, including Homo sapiens. When in a given cluster, distantly related sequences from different genomes co-exist the procedure allows safe transfer of annotation, for both structure and function, independently of the level of sequence identity. In some specific cases of functional annotation, human sequences can be safely modeled on prokaryotic templates. In computing these models, machine learning approaches to sequence analysis may help in constraining the optimal alignment of the distantly related sequences. Our analysis addresses the problem of structural transfer among distantly related proteins and permits solutions that increase the structure of the human proteome b ysome 6,000 models.

Protein structure prediction in the genomic era: Annotation-facilitated remote homology detection

CASADIO, RITA;BARTOLI, LISA;FARISELLI, PIERO;TASCO, GIANLUCA;MARTELLI, PIER LUIGI
2014

Abstract

As a result of large sequencing projects, databases of protein sequences and structures are growing rapidly. The number of protein sequences, however, is orders of magnitude larger than the number of structures known at the atomic level. This is true despite efforts to accelerate procedures to resolve protein structures. Tools have been developed to bridge the gap between protein sequence and 3D structure, based on retrieval of information from databases and the use of knowledge-based methods to provide solutions to the protein folding problem. Several machine learning approaches are available to address various sub-problems of protein structure prediction. These include secondary structure, recognition of domains, motifs, ligand binding sites, and the topology of membrane proteins. In protein design, all these features can help to compute a putative protein model. This chapter reviews the state of the art of protein structure prediction, and describes a recent non-hierarchical clustering procedure implemented to fully exploit current knowledge in the databases on sequences, structures and functions. This procedure increases the number of sequences that can be annotated (inferring function of sequence from database information) by transfer of annotation from a set of 599 genomes, including Homo sapiens. When in a given cluster, distantly related sequences from different genomes co-exist the procedure allows safe transfer of annotation, for both structure and function, independently of the level of sequence identity. In some specific cases of functional annotation, human sequences can be safely modeled on prokaryotic templates. In computing these models, machine learning approaches to sequence analysis may help in constraining the optimal alignment of the distantly related sequences. Our analysis addresses the problem of structural transfer among distantly related proteins and permits solutions that increase the structure of the human proteome b ysome 6,000 models.
2014
Medicinal Chemistry in Drug Discovery. Design, Synthesis and Screening
197
218
Rita Casadio; Lisa Bartoli; Piero Fariselli; Gianluca Tasco; Pier Luigi Martelli
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/145736
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact