From protein variations to biological processes and pathways with NET-GE

Bovo, Samuele; Di Lena, Pietro; Martelli, Pier Luigi; Fariselli, Piero; Casadio, Rita

Technologies capable of investing the organism complexity at different levels of resolution have led to an increase of genomic, proteomic and interactomic data. Genomic data are often generated by hospitals and used for clinical practice in order to better understand, at molecular levels, the origin of the different pathologies. In this context, we are nowadays focusing on the single individual and generating the field of precision/genomic medicine, where each phenotype needs annotations for reconciling specific variations with common biological processes and pathways. For this purpose, in order to shedding light on the molecular mechanisms and functions at the basis of such phenotypes, enrichment analysis is the mainly applied procedure. Several standard and network-based enrichment methods are currently available. However, standard enrichment methods rely only on the annotations that characterize the genes/proteins included in the input set without considering them in the context of their interaction network. To this purpose, we developed NET-GE [1], a novel NETwork-based Gene Enrichment tool for detecting the unifying biological processes and pathways, given a list of possible proteins in which variations are present. Several approaches exploiting the interaction networks for functional association analysis have emerged in the last few years. These methods may be classified into two main classes: A) methods that use the topology of the interaction network to infer how much similar distinct sets of gene/proteins are, and B) methods that identify functionally-related modules in interaction networks and then infer protein/gene biological roles from such modules. Among available tools that perform network-based enrichment analysis, EnrichNet [2] and PINA [3] are two of the most cited methods, representative of the A and B classes, respectively. Our method falls within class B and it is based on a pre-processing phase aimed at identifying interconnected and compact modules in a molecular interaction network. However, differently from all the other approaches in class B, the modules found by our method are function-specific by construction, since they are built starting from seed sets collecting all the proteins related to a specific biological annotation. Briefly, NET-GE relies on the interactions among human proteins available in the STRING database. For each set of annotations (Gene Ontology [4], KEGG [5] and Reactome [6] pathways), proteins sharing the same annotation are col- lected in a seed set and then extended in to a compact and connected sub-graph (module) of the molecular-interaction network. The module is determined by computing all the shortest paths among the seeds and by reducing the resulting network into the minimal connecting network preserving the distances among seeds. The minimal connecting network adds to the seeds a set of connecting nodes that are more reliably related to the reference reference annotation. The protein set to be analysed is mapped on each sub-networks, determining, through a Fisher’s exact test, whether there are significant overlaps between the input set and the network modules. NET-GE implements both the standard and the network-based enrichment methods. A web server is available at: http://net-ge.biocomp.unibo.it/enrich. Search options include Gene Ontology terms, KEGG or Reactome pathways, two different STRING networks. Results consist of significantly enriched anno- tations, also graphically depicted in the context of their relationships. For each annotation, proteins composing the module are listed. Network-based enriched annotations are emphasized and those not-directly related to the input proteins are also aside reported. When tested on an OMIM-derived benchmark, our method is able to detect functional associations not detectable by the standard enrichment. Moreover, the newly enriched terms that are absent in the original annotations of the input genes are likely to gain new knowledge on the phenotype under examination. Concluding, NET-GE is useful for highlighting new hypotheses on the molec- ular mechanisms underlying a given human phenotype. Furthermore, with our procedure, it is possible to explore new genes/proteins in the subgraph-enriched- network for helping the prioritization of genetic variant discovery.

Bovo, S., DI LENA, P., Martelli, P.L., Fariselli, P., Casadio, R. (2016). From protein variations to biological processes and pathways with NET-GE. Kernel Press UG (haftungsbeschränkt).