CRIS Current Research Information System

To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.

Ilaria Bartolini, V.M. (2022). COSINER: COntext SImilarity data augmentation for Named Entity Recognition.

COSINER: COntext SImilarity data augmentation for Named Entity Recognition

Ilaria Bartolini;Vincenzo Moscato;Marco Postiglione;Giancarlo Sperlì;Andrea Vignali

2022

Abstract

To alleviate the scarcity of manually annotated data in Named Entity Recognition (NER) tasks, data augmentation methods can be applied to automatically generate labeled data and improve performance of existing methods. However, based on manipulations of the input text, current techniques may generate too many noisy and mislabeled samples. In this paper we propose COntext SImilarity-based data augmentation for NER (COSINER), a method for NER data augmentation based on context similarity, i.e. we replace entity mentions with the most plausible ones based on the available training data and the contexts in which entities usually appear. We conduct experiments on popular benchmark datasets, showing that our method outperforms current baselines in various few-shot scenarios, where training data is assumed to be strongly limited. Experimental results show that not only does COSINER overcome baselines in terms of NER performances in highly-limited scenarios (2%, 5%), but also its computing times are comparable to simplest augmentation methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				15th International Conference on Similarity Search and Applications, SISAP 2022
			
	Pagina iniziale
	
				11
			
	Pagina finale
	
				24
			
	Citazione
	
				Ilaria Bartolini, V.M. (2022). COSINER: COntext SImilarity data augmentation for Named Entity Recognition.
			
	Tutti gli autori
	
						Ilaria Bartolini, Vincenzo Moscato, Marco Postiglione, Giancarlo Sperlì, Andrea Vignali

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/894852

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

7

3

social impact