One of the primary challenges in human genetics is determining the functional impact of single nucleotide variants (SNVs) and insertion and deletions (InDels), whether coding or noncoding. In the past, methods have been created to detect disease-related single amino acid changes, but only some can assess the influence of noncoding variations. CADD is the most commonly used and advanced algorithm for predicting the diverse effects of genome variations. It employs a combination of sequence conservation and functional features derived from the ENCODE project data. To use CADD, a large set of pre-calculated information must be downloaded during the installation process. To streamline the variant annotation process, we developed PhD-SNPg, a machine-learning tool that is easy to install and lightweight, relying solely on sequence-based features. Here we present an updated version, trained on a larger dataset, that can also predict the impact of the InDel variations. Despite its simplicity, PhD-SNPg performs similarly to CADD, making it ideal for rapid genome interpretation and as a benchmark for tool development.

PhD-SNPg: updating a webserver and lightweight tool for scoring nucleotide variants / Capriotti, Emidio; Fariselli, Piero. - In: NUCLEIC ACIDS RESEARCH. - ISSN 1362-4962. - ELETTRONICO. - 51:W1(2023), pp. W451-W458. [10.1093/nar/gkad455]

PhD-SNPg: updating a webserver and lightweight tool for scoring nucleotide variants

Capriotti, Emidio
Primo
;
2023

Abstract

One of the primary challenges in human genetics is determining the functional impact of single nucleotide variants (SNVs) and insertion and deletions (InDels), whether coding or noncoding. In the past, methods have been created to detect disease-related single amino acid changes, but only some can assess the influence of noncoding variations. CADD is the most commonly used and advanced algorithm for predicting the diverse effects of genome variations. It employs a combination of sequence conservation and functional features derived from the ENCODE project data. To use CADD, a large set of pre-calculated information must be downloaded during the installation process. To streamline the variant annotation process, we developed PhD-SNPg, a machine-learning tool that is easy to install and lightweight, relying solely on sequence-based features. Here we present an updated version, trained on a larger dataset, that can also predict the impact of the InDel variations. Despite its simplicity, PhD-SNPg performs similarly to CADD, making it ideal for rapid genome interpretation and as a benchmark for tool development.
2023
PhD-SNPg: updating a webserver and lightweight tool for scoring nucleotide variants / Capriotti, Emidio; Fariselli, Piero. - In: NUCLEIC ACIDS RESEARCH. - ISSN 1362-4962. - ELETTRONICO. - 51:W1(2023), pp. W451-W458. [10.1093/nar/gkad455]
Capriotti, Emidio; Fariselli, Piero
File in questo prodotto:
File Dimensione Formato  
capriotti-nar2023.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC)
Dimensione 776.52 kB
Formato Adobe PDF
776.52 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/947634
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact