Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

Exploring the Boundaries: Gene and Protein Identification in Biomedical Text / Finkel J.; Dingare S.; Manning C.D.; Nissim M.; Alex B.; Grover C.. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - ELETTRONICO. - 6(Suppl 1):(2005).

Exploring the Boundaries: Gene and Protein Identification in Biomedical Text

NISSIM, MALVINA;
2005

Abstract

Background: Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools. Methods: We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts. Results: This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation. Conclusion: Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.
2005
Exploring the Boundaries: Gene and Protein Identification in Biomedical Text / Finkel J.; Dingare S.; Manning C.D.; Nissim M.; Alex B.; Grover C.. - In: BMC BIOINFORMATICS. - ISSN 1471-2105. - ELETTRONICO. - 6(Suppl 1):(2005).
Finkel J.; Dingare S.; Manning C.D.; Nissim M.; Alex B.; Grover C.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/41494
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 72
  • ???jsp.display-item.citation.isi??? 49
social impact