Alternative splicing is a post-transcriptional event leading to an increase in the transcriptome diversity. Our goal is to analyze putative splicing variants for their likelihood of becoming native mature proteins. Functional and structural annotation will be performed with a two step procedure as follows: 1) first, the sequence will be aligned against a data base of alignment outputs as derived after cross genome comparisons on 800 whole genomes; 2) second, the sequence will be filtered with a pipeline that integrates different structure/function predictors specifically suited to take as input a protein sequence and highlight all the relevant structural/functional features that native proteins carry along. The first step is based on the notion that clustering of proteins into functional and structural families may greatly benefit the process of functional annotation; the second step is based on the notion that different predictors of structural and functional features can differently generalize on the specific property they have been trained on and that their integration with information derived from the available data bases can lead to a complete annotation process. The platform will integrate predictors specifically developed for eukaryotic proteomes, and capable of predicting different structural/functional features. These will include predictors of signal peptide, GPI anchors, disulfide bridge, subcellular localization and others. A first discrimination will distinguish among globular and membrane proteins. All variants will then be structurally modeled when possible. Finally all the variants will be also annotated in relation to their possibility of being or not involved in genetic disorders. The platform will therefore constitute a seamless object that can process whole proteomes at demand. The final results will be available as databases or browsed using a web application where all the data (primary and derived) can be retrieved. This means that biosequences, annotations, files, documents, will be managed by a single, rational and unifying multipurpose system. Our main effort will be devoted to the integration of different input/output data required to run programs in cascades and in coherent and fault-tolerant way. The infrastructure will be implemented using python-language and plone-based technology and/or adopting part or ideas derived from the Taverna workbench, which is a free software tool that aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology. The platform will therefore allow an automatic annotation process at large that will produce a data base of annotated protein variants. A novelty will be the evaluation of a reliability index for each variant of being a native protein. This will be achieved by developing an expert system having the ENCODE data base as a reference set.

Analisi su larga scala dello splicing alternativo nel trascrittoma umano mediante approcci computazionali e sperimentali

CASADIO, RITA
In corso di stampa

Abstract

Alternative splicing is a post-transcriptional event leading to an increase in the transcriptome diversity. Our goal is to analyze putative splicing variants for their likelihood of becoming native mature proteins. Functional and structural annotation will be performed with a two step procedure as follows: 1) first, the sequence will be aligned against a data base of alignment outputs as derived after cross genome comparisons on 800 whole genomes; 2) second, the sequence will be filtered with a pipeline that integrates different structure/function predictors specifically suited to take as input a protein sequence and highlight all the relevant structural/functional features that native proteins carry along. The first step is based on the notion that clustering of proteins into functional and structural families may greatly benefit the process of functional annotation; the second step is based on the notion that different predictors of structural and functional features can differently generalize on the specific property they have been trained on and that their integration with information derived from the available data bases can lead to a complete annotation process. The platform will integrate predictors specifically developed for eukaryotic proteomes, and capable of predicting different structural/functional features. These will include predictors of signal peptide, GPI anchors, disulfide bridge, subcellular localization and others. A first discrimination will distinguish among globular and membrane proteins. All variants will then be structurally modeled when possible. Finally all the variants will be also annotated in relation to their possibility of being or not involved in genetic disorders. The platform will therefore constitute a seamless object that can process whole proteomes at demand. The final results will be available as databases or browsed using a web application where all the data (primary and derived) can be retrieved. This means that biosequences, annotations, files, documents, will be managed by a single, rational and unifying multipurpose system. Our main effort will be devoted to the integration of different input/output data required to run programs in cascades and in coherent and fault-tolerant way. The infrastructure will be implemented using python-language and plone-based technology and/or adopting part or ideas derived from the Taverna workbench, which is a free software tool that aims to provide a language and software tools to facilitate easy use of workflow and distributed compute technology. The platform will therefore allow an automatic annotation process at large that will produce a data base of annotated protein variants. A novelty will be the evaluation of a reliability index for each variant of being a native protein. This will be achieved by developing an expert system having the ENCODE data base as a reference set.
In corso di stampa
2011
Rita Casadio
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/145738
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact