In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.

Bartolini I., Moscato V., Postiglione M., Sperlì G., Vignali A. (2023). Data augmentation via context similarity: An application to biomedical Named Entity Recognition. INFORMATION SYSTEMS, 119(102291), 1-9 [10.1016/j.is.2023.102291].

Data augmentation via context similarity: An application to biomedical Named Entity Recognition

Bartolini I.;
2023

Abstract

In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.
2023
Bartolini I., Moscato V., Postiglione M., Sperlì G., Vignali A. (2023). Data augmentation via context similarity: An application to biomedical Named Entity Recognition. INFORMATION SYSTEMS, 119(102291), 1-9 [10.1016/j.is.2023.102291].
Bartolini I.; Moscato V.; Postiglione M.; Sperlì G.; Vignali A.
File in questo prodotto:
File Dimensione Formato  
11585 955111-Information_System_Journal-1.pdf

Open Access dal 06/10/2024

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 920.02 kB
Formato Adobe PDF
920.02 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/955111
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 1
social impact