Genomics, with the high amount of heterogeneous data that it is generating, is opening many interesting practical and theoretical computational problems; one of them is the search for a collections of genomic regions at given distances from each other, i.e., a pattern of genomic regions, along the whole genome. In this paper we present an optimized pattern-search algorithm able to find efficiently, within a large set of genomic data, genomic region sequences which are similar to a given pattern. We start with a base version of the problem, which is solved using dynamic programming enhanced with an efficient window-based technique; then, we extend the algorithm to more complex scenarios with practical applications in revealing interesting and unknown regions of the genome, thus, making it an important ingredient in supporting biological research. We apply our algorithm to enhancer detection, a relevant biological problem, showing that the method is both efficient and accurate.

Montanari, P., Bartolini, I., Ciaccia, P., Patella, M., Ceri, S., Masseroli, M. (2016). Pattern similarity search in genomic sequences. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 28(11), 3053-3067 [10.1109/TKDE.2016.2595582].

Pattern similarity search in genomic sequences

MONTANARI, PIERO;BARTOLINI, ILARIA;CIACCIA, PAOLO;PATELLA, MARCO;
2016

Abstract

Genomics, with the high amount of heterogeneous data that it is generating, is opening many interesting practical and theoretical computational problems; one of them is the search for a collections of genomic regions at given distances from each other, i.e., a pattern of genomic regions, along the whole genome. In this paper we present an optimized pattern-search algorithm able to find efficiently, within a large set of genomic data, genomic region sequences which are similar to a given pattern. We start with a base version of the problem, which is solved using dynamic programming enhanced with an efficient window-based technique; then, we extend the algorithm to more complex scenarios with practical applications in revealing interesting and unknown regions of the genome, thus, making it an important ingredient in supporting biological research. We apply our algorithm to enhancer detection, a relevant biological problem, showing that the method is both efficient and accurate.
2016
Montanari, P., Bartolini, I., Ciaccia, P., Patella, M., Ceri, S., Masseroli, M. (2016). Pattern similarity search in genomic sequences. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 28(11), 3053-3067 [10.1109/TKDE.2016.2595582].
Montanari, Piero; Bartolini, Ilaria; Ciaccia, Paolo; Patella, Marco; Ceri, Stefano; Masseroli, Marco
File in questo prodotto:
File Dimensione Formato  
TKDE-2016bmp.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 1.55 MB
Formato Adobe PDF
1.55 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/563517
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 3
social impact