Structure comparison is a fundamental problem for structural genomics, with applications to drug design, protein fold prediction, protein clustering and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years developments on the study of an emerging measure, the Contact Map Overlap (CMO), for protein structure comparison. This measure compares two protein structures by comparing their contact maps. A contact map is a list of pairs of residues which lie in 3-dimensional proximity in the protein's native fold. Although in principle computationally hard to optimize, we have shown how this measure can in fact be computed with great accuracy for related % similar proteins of size of interest (about 150 residues and 250 contacts), by Integer Linear Programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. Effective heuristics, such as Local Search and Genetic Algorithms, have also been developed. We were able to obtain for the first time optimal alignments for large similar proteins (about 1000 residues and 2000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are at most off by 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact, and how to choose this threshold in a sensitive way.

1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap

CAPRARA, ALBERTO;
2004

Abstract

Structure comparison is a fundamental problem for structural genomics, with applications to drug design, protein fold prediction, protein clustering and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years developments on the study of an emerging measure, the Contact Map Overlap (CMO), for protein structure comparison. This measure compares two protein structures by comparing their contact maps. A contact map is a list of pairs of residues which lie in 3-dimensional proximity in the protein's native fold. Although in principle computationally hard to optimize, we have shown how this measure can in fact be computed with great accuracy for related % similar proteins of size of interest (about 150 residues and 250 contacts), by Integer Linear Programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. Effective heuristics, such as Local Search and Genetic Algorithms, have also been developed. We were able to obtain for the first time optimal alignments for large similar proteins (about 1000 residues and 2000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are at most off by 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact, and how to choose this threshold in a sensitive way.
2004
A. Caprara; R. Carr; S. Istrail; G. Lancia; B. Walenz
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/16990
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 20
  • Scopus 106
  • ???jsp.display-item.citation.isi??? 90
social impact