Biomedical Named Entity Recognition (BioNER) faces significant challenges in real-world applications due to limited annotated data and the constant emergence of new entity types, making zero-shot learning capabilities crucial. While Large Language Models (LLMs) possess extensive domain knowledge necessary for specialized fields like biomedicine, their computational costs often make them impractical. To address these challenges, we introduce OpenBioNER, a lightweight BERT-based cross-encoder architecture that can identify any biomedical entity using only its description, eliminating the need for retraining on new, unseen entity types. Through comprehensive evaluation on established biomedical benchmarks, we demonstrate that OpenBioNER surpasses state-of-the-art baselines, including specialized 7B NER LLMs and GPT-4o, achieving up to 10% higher F1 scores while using 110M parameters only. Moreover, OpenBioNER outperforms existing small-scale models that match textual spans with entity types rather than descriptions, both in terms of accuracy and computational efficiency.

Cocchieri, A., Frisoni, G., Martinez Galindo, M., Moro, G., Tagliavini, G., Candoli, F. (2025). OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description.

OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description

Alessio Cocchieri
Co-primo
;
Giacomo Frisoni
Co-primo
;
Gianluca Moro
Co-primo
;
Giuseppe Tagliavini
Penultimo
;
2025

Abstract

Biomedical Named Entity Recognition (BioNER) faces significant challenges in real-world applications due to limited annotated data and the constant emergence of new entity types, making zero-shot learning capabilities crucial. While Large Language Models (LLMs) possess extensive domain knowledge necessary for specialized fields like biomedicine, their computational costs often make them impractical. To address these challenges, we introduce OpenBioNER, a lightweight BERT-based cross-encoder architecture that can identify any biomedical entity using only its description, eliminating the need for retraining on new, unseen entity types. Through comprehensive evaluation on established biomedical benchmarks, we demonstrate that OpenBioNER surpasses state-of-the-art baselines, including specialized 7B NER LLMs and GPT-4o, achieving up to 10% higher F1 scores while using 110M parameters only. Moreover, OpenBioNER outperforms existing small-scale models that match textual spans with entity types rather than descriptions, both in terms of accuracy and computational efficiency.
2025
Findings of the Association for Computational Linguistics: NAACL 2025
1
20
Cocchieri, A., Frisoni, G., Martinez Galindo, M., Moro, G., Tagliavini, G., Candoli, F. (2025). OpenBioNER: Lightweight Open-Domain Biomedical Named Entity Recognition Through Entity Type Description.
Cocchieri, Alessio; Frisoni, Giacomo; Martinez Galindo, Marcos; Moro, Gianluca; Tagliavini, Giuseppe; Candoli, Francesco
File in questo prodotto:
File Dimensione Formato  
2025.findings-naacl.47.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 680.11 kB
Formato Adobe PDF
680.11 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1009721
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact