Motivation. Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue–residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. Results. Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. Availability and implementation. The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.
Pietro Di Lena, Pierre Baldi (2020). CCpro.
CCpro
Pietro Di Lena;
2020
Abstract
Motivation. Protein fold recognition is a key step for template-based modeling approaches to protein structure prediction. Although closely related folds can be easily identified by sequence homology search in sequence databases, fold recognition is notoriously more difficult when it involves the identification of distantly related homologs. Recent progress in residue–residue contact and distance prediction opens up the possibility of improving fold recognition by using structural information contained in predicted distance and contact maps. Results. Here we propose to use the congruence coefficient as a metric of similarity between maps. We prove that this metric has several interesting mathematical properties which allow one to compute in polynomial time its exact mean and variance over all possible (exponentially many) alignments between two symmetric matrices, and assess the statistical significance of similarity between aligned maps. We perform fold recognition tests by recovering predicted target contact/distance maps from the two most recent Critical Assessment of Structure Prediction editions and over 27 000 non-homologous structural templates from the ECOD database. On this large benchmark, we compare fold recognition performances of different alignment tools with their own similarity scores against those obtained using the congruence coefficient. We show that the congruence coefficient overall improves fold recognition over other methods, proving its effectiveness as a general similarity metric for protein map comparison. Availability and implementation. The congruence coefficient software CCpro is available as part of the SCRATCH suite at: http://scratch.proteomics.ics.uci.edu/.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.