Background: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools - Genome Analysis Toolkit (GATK) and MuTect - to create the GATK-LODN method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. Results: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LODN increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LODN filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LODN increased both specificity and sensitivity of GATK results. Conclusion: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LODN method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes.

Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data / do Valle, Ítalo Faria; Giampieri, Enrico; Simonetti, Giorgia; Padella, Antonella; Manfrini, Marco; Ferrari, Anna; Papayannidis, Cristina; Zironi, Isabella; Garonzi, Marianna; Bernardi, Simona; Delledonne, Massimo; Martinelli, Giovanni; Remondini, Daniel; Castellani, Gastone. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - ELETTRONICO. - 17:S12(2016), pp. 341.27-341.35. [10.1186/s12859-016-1190-7]

Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data

GIAMPIERI, ENRICO;SIMONETTI, GIORGIA;PADELLA, ANTONELLA;MANFRINI, MARCO;FERRARI, ANNA;PAPAYANNIDIS, CRISTINA;ZIRONI, ISABELLA;MARTINELLI, GIOVANNI;REMONDINI, DANIEL;CASTELLANI, GASTONE
2016

Abstract

Background: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools - Genome Analysis Toolkit (GATK) and MuTect - to create the GATK-LODN method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. Results: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LODN increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LODN filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LODN increased both specificity and sensitivity of GATK results. Conclusion: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LODN method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes.
2016
Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data / do Valle, Ítalo Faria; Giampieri, Enrico; Simonetti, Giorgia; Padella, Antonella; Manfrini, Marco; Ferrari, Anna; Papayannidis, Cristina; Zironi, Isabella; Garonzi, Marianna; Bernardi, Simona; Delledonne, Massimo; Martinelli, Giovanni; Remondini, Daniel; Castellani, Gastone. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - ELETTRONICO. - 17:S12(2016), pp. 341.27-341.35. [10.1186/s12859-016-1190-7]
do Valle, Ítalo Faria; Giampieri, Enrico; Simonetti, Giorgia; Padella, Antonella; Manfrini, Marco; Ferrari, Anna; Papayannidis, Cristina; Zironi, Isabella; Garonzi, Marianna; Bernardi, Simona; Delledonne, Massimo; Martinelli, Giovanni; Remondini, Daniel; Castellani, Gastone
File in questo prodotto:
File Dimensione Formato  
WES NGS pipeline_Remondini_BMCBioinformatics16.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 629.42 kB
Formato Adobe PDF
629.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/572318
Citazioni
  • ???jsp.display-item.citation.pmc??? 65
  • Scopus 84
  • ???jsp.display-item.citation.isi??? 81
social impact