In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.

Data augmentation via context similarity: An application to biomedical Named Entity Recognition / Bartolini I.; Moscato V.; Postiglione M.; Sperlì G.; Vignali A.. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - STAMPA. - 119:102291(2023), pp. 1-9. [10.1016/j.is.2023.102291]

Data augmentation via context similarity: An application to biomedical Named Entity Recognition

Bartolini I.;
2023

Abstract

In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.
2023
Data augmentation via context similarity: An application to biomedical Named Entity Recognition / Bartolini I.; Moscato V.; Postiglione M.; Sperlì G.; Vignali A.. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - STAMPA. - 119:102291(2023), pp. 1-9. [10.1016/j.is.2023.102291]
Bartolini I.; Moscato V.; Postiglione M.; Sperlì G.; Vignali A.
File in questo prodotto:
File Dimensione Formato  
11585 955111-Information_System_Journal-1.pdf

embargo fino al 05/10/2024

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 920.02 kB
Formato Adobe PDF
920.02 kB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/955111
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact