Motivation Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue–residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. Results Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. Availability and implementation The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.

Fold recognition by scoring protein maps using the congruence coefficient / Di Lena, Pietro; Baldi, Pierre. - In: BIOINFORMATICS. - ISSN 1367-4803. - ELETTRONICO. - 37:4(2021), pp. 506-513. [10.1093/bioinformatics/btaa833]

Fold recognition by scoring protein maps using the congruence coefficient

Di Lena, Pietro
;
2021

Abstract

Motivation Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue–residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. Results Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. Availability and implementation The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.
2021
Fold recognition by scoring protein maps using the congruence coefficient / Di Lena, Pietro; Baldi, Pierre. - In: BIOINFORMATICS. - ISSN 1367-4803. - ELETTRONICO. - 37:4(2021), pp. 506-513. [10.1093/bioinformatics/btaa833]
Di Lena, Pietro; Baldi, Pierre
File in questo prodotto:
File Dimensione Formato  
BIOINF-2020-0951.R2_Proof_hi.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 378.1 kB
Formato Adobe PDF
378.1 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/811547
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact