Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.

Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications

FARISELLI, PIERO;SAVOJARDO, CASTRENSE;MARTELLI, PIER LUIGI;CASADIO, RITA
2009

Abstract

Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.
Fariselli P.; Savojardo C.; Martelli P.L.; Casadio R.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/79367
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 16
social impact