CRIS Current Research Information System

The automated detection of harmful language has been of great importance for the online world, especially with the growing importance of social media and, consequently, polarisation. There are many open challenges to high quality detection of harmful text, from dataset creation to generalisable application, thus calling for more systematic studies. In this paper, we explore re-annotation as a means of examining the robustness of already existing labelled datasets, showing that, despite using alternative definitions, the inter-annotator agreement remains very inconsistent, highlighting the intrinsically subjective and variable nature of the task. In addition, we build automatic toxicity detectors using the existing datasets, with their original labels, and we evaluate them on our multi-definition and multi-source datasets. Surprisingly, while other studies show that hate speech detection models perform better on data that are derived from the same distribution as the training set, our analysis demonstrates this is not necessarily true.

Korre, A., Pavlopoulos, J., Sorensen, J., Laugier, L., Androutsopoulos, I., Dixon, L., et al. (2023). Harmful Language Datasets: An Assessment of Robustness. Toronto : Association for Computational Linguistics [10.18653/v1/2023.woah-1.24].

Harmful Language Datasets: An Assessment of Robustness

Korre, Aikaterini;Pavlopoulos, John;Sorensen, Jeffrey;Laugier, Léo;Androutsopoulos, Ion;Dixon, Lucas;Barrón-cedeño, Alberto

2023

Abstract

The automated detection of harmful language has been of great importance for the online world, especially with the growing importance of social media and, consequently, polarisation. There are many open challenges to high quality detection of harmful text, from dataset creation to generalisable application, thus calling for more systematic studies. In this paper, we explore re-annotation as a means of examining the robustness of already existing labelled datasets, showing that, despite using alternative definitions, the inter-annotator agreement remains very inconsistent, highlighting the intrinsically subjective and variable nature of the task. In addition, we build automatic toxicity detectors using the existing datasets, with their original labels, and we evaluate them on our multi-definition and multi-source datasets. Surprisingly, while other studies show that hate speech detection models perform better on data that are derived from the same distribution as the training set, our analysis demonstrates this is not necessarily true.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Titolo del volume
	
				The 7th Workshop on Online Abuse and Harms (WOAH)
			
	Pagina iniziale
	
				221
			
	Pagina finale
	
				230
			
	Codice DOI
	
				https://dx.doi.org/10.18653/v1/2023.woah-1.24
			
	Citazione
	
				Korre, A., Pavlopoulos, J., Sorensen, J., Laugier, L., Androutsopoulos, I., Dixon, L., et al. (2023). Harmful Language Datasets: An Assessment of Robustness. Toronto : Association for Computational Linguistics [10.18653/v1/2023.woah-1.24].
			
	Tutti gli autori
	
						Korre, Aikaterini; Pavlopoulos, John; Sorensen, Jeffrey; Laugier, Léo; Androutsopoulos, Ion; Dixon, Lucas; Barrón-cedeño, Alberto...espandi
						
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2023.woah-1.24.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 525.12 kB Formato Adobe PDF Visualizza/Apri	525.12 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/946522

Citazioni

ND

3

ND

social impact