To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.

Ilaria Bartolini, V.M. (2022). COSINER: COntext SImilarity data augmentation for Named Entity Recognition.

COSINER: COntext SImilarity data augmentation for Named Entity Recognition

Ilaria Bartolini;
2022

Abstract

To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.
2022
15th International Conference on Similarity Search and Applications, SISAP 2022
11
24
Ilaria Bartolini, V.M. (2022). COSINER: COntext SImilarity data augmentation for Named Entity Recognition.
Ilaria Bartolini, Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì, Andrea Vignali
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/894852
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 2
social impact