Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations.

On the upper bound of the prediction accuracy of residue contacts in proteins with correlated mutations: the case study of the similarity matrices

DI LENA, PIETRO;FARISELLI, PIERO;MARGARA, LUCIANO;VASSURA, MARCO;CASADIO, RITA
2009

Abstract

Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In this paper, we describe an optimization procedure that maximizes the correlation between the Pearson coefficient and the protein residue contacts with respect to different similarity matrices, including random. Our results indicate that there is a large number of equivalent matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in the prediction of the protein residue contacts is independent of the optimized similarity matrix. This suggests that poor scoring may be due to the choice of the linear correlation function in evaluating correlated mutations.
Algorithms in Bioinformatics
62
72
P. Di Lena; P. Fariselli; L. Margara; M. Vassura; R. Casadio
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/80088
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 0
social impact