CRIS Current Research Information System

Background: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. Results: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. Conclusions: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.

Rachel M. Miller, B.T.J. (2022). Enhanced protein isoform characterization through long-read proteogenomics. GENOME BIOLOGY, 23(1), 1-28 [10.1186/s13059-022-02624-y].

Enhanced protein isoform characterization through long-read proteogenomics

Rachel M. Miller;Ben T. Jordan;Madison M. Mehlferber;Erin D. Jeffery;Christina Chatzipantsiou;Simi Kaur;Robert J. Millikin;Yunxiang Dai;Simone Tiberi;Peter J. Castaldi;Michael R. Shortreed;Chance John Luckey;Ana Conesa;Lloyd M. Smith;Anne Deslattes Mays;Gloria M. Sheynkman

2022

Abstract

Background: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. Results: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. Conclusions: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Rivista
	
				GENOME BIOLOGY
			
	Codice DOI
	
				https://dx.doi.org/10.1186/s13059-022-02624-y
			
	Citazione
	
				Rachel M. Miller, B.T.J. (2022). Enhanced protein isoform characterization through long-read proteogenomics. GENOME BIOLOGY, 23(1), 1-28 [10.1186/s13059-022-02624-y].
			
	Tutti gli autori
	
						Rachel M. Miller, Ben T. Jordan, Madison M. Mehlferber, Erin D. Jeffery, Christina Chatzipantsiou, Simi Kaur, Robert J. Millikin, Yunxiang Dai, Simone...espandi
						
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s13059-022-02624-y.pdf accesso aperto Descrizione: Articolo Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 2.11 MB Formato Adobe PDF Visualizza/Apri	2.11 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/949883

Citazioni

27

49

49

social impact