Protein embedding is a protein representation that carries along the information derived from filtering large volumes of sequences stored in large archives. Routinely, the protein is represented by a matrix in which each residue is a context-specific vector whose dimensions reflect the size of the large architectures of neural networks (transformers) trained with deep learning algorithms on large volumes of sequences. A recently introduced method (Embedding-Based Alignment, EBA) is particularly suited for pairwise embedding comparisons and, as we report here, allows for remote homolog detection under specific constraints, including protein sequence length similarity. Multifunctional proteins are present in different species. However, particularly in humans, the problem of their structural and functional annotation is urgent since, according to recent statistics, they comprise up to 50% of the human reference proteome. In this paper we show that when EBA is applied to a set of randomly selected multifunctional human proteins, it retrieves, after a clustering procedure and rigorous validation on the reference Swiss-Prot database, proteins that are remote homologs to each other and carry similar structural and functional features as the query protein.
Vazzana, G., Manfredi, M., Savojardo, C., Martelli, P.L., Casadio, R. (2026). Embedding-Based Alignments Capture Structural and Sequence Domains of Distantly Related Multifunctional Human Proteins. COMPUTATION, 14(1), 1-13 [10.3390/computation14010025].
Embedding-Based Alignments Capture Structural and Sequence Domains of Distantly Related Multifunctional Human Proteins
Vazzana G.Primo
;Manfredi M.;Savojardo C.;Martelli P. L.
;Casadio R.
2026
Abstract
Protein embedding is a protein representation that carries along the information derived from filtering large volumes of sequences stored in large archives. Routinely, the protein is represented by a matrix in which each residue is a context-specific vector whose dimensions reflect the size of the large architectures of neural networks (transformers) trained with deep learning algorithms on large volumes of sequences. A recently introduced method (Embedding-Based Alignment, EBA) is particularly suited for pairwise embedding comparisons and, as we report here, allows for remote homolog detection under specific constraints, including protein sequence length similarity. Multifunctional proteins are present in different species. However, particularly in humans, the problem of their structural and functional annotation is urgent since, according to recent statistics, they comprise up to 50% of the human reference proteome. In this paper we show that when EBA is applied to a set of randomly selected multifunctional human proteins, it retrieves, after a clustering procedure and rigorous validation on the reference Swiss-Prot database, proteins that are remote homologs to each other and carry similar structural and functional features as the query protein.| File | Dimensione | Formato | |
|---|---|---|---|
|
computation-14-00025(1).pdf
accesso aperto
Descrizione: PDF editoriale
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
2.34 MB
Formato
Adobe PDF
|
2.34 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



