Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.
Fariselli P., Savojardo C., Martelli P.L., Casadio R. (2009). Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications. ALGORITHMS FOR MOLECULAR BIOLOGY, 4, 13 [10.1186/1748-7188-4-13].
Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications
FARISELLI, PIERO;SAVOJARDO, CASTRENSE;MARTELLI, PIER LUIGI;CASADIO, RITA
2009
Abstract
Background: Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results: We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows regular grammar rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion: We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that RHCRFs can be useful tools for biosequence analysis applications. Availability: GRHCRF software is available under GPLv3 licence at he website http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.