In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.
Bartolini I., Moscato V., Postiglione M., Sperlì G., Vignali A. (2023). Data augmentation via context similarity: An application to biomedical Named Entity Recognition. INFORMATION SYSTEMS, 119(102291), 1-9 [10.1016/j.is.2023.102291].
Data augmentation via context similarity: An application to biomedical Named Entity Recognition
Bartolini I.;
2023
Abstract
In this paper, we present COntext SImilarity-based data augmentation for NER (COSINER), a new method for improving Named Entity Recognition (NER) tasks using data augmentation. Unlike current techniques, which may generate noisy and mislabeled samples through text manipulation, COSINER uses context similarity to replace entity mentions with more plausible ones on the basis of available training data and considering the context in which entities typically appear. Through experiments on popular benchmark datasets, we show that COSINER outperforms existing baselines in various few-shot scenarios where training data is limited. Additionally, our method's computing times are comparable to the simplest augmentation methods and are better than approaches that rely on pre-trained models in their architecture.File | Dimensione | Formato | |
---|---|---|---|
11585 955111-Information_System_Journal-1.pdf
Open Access dal 06/10/2024
Tipo:
Postprint
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione
920.02 kB
Formato
Adobe PDF
|
920.02 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.