CRIS Current Research Information System

Hate speech is infamously one of the most demanding topics in Natural Language Processing, as its multifacetedness is accompanied by a handful of challenges, such as multilinguality and cross-linguality. Hate speech has a subjective aspect that intensifies when referring to different cultures and different languages. In this respect, we design a pipeline that will help us explore the possibility of the creation of a parallel multilingual hate speech dataset, using machine translation. In this paper, we evaluate how/whether this is feasible by assessing the quality of the translations, calculating the toxicity levels of original and target texts, and calculating correlations between the newly obtained scores. Finally, we perform a qualitative analysis to gain further semantic and grammatical insights. With this pipeline we aim at exploring ways of filtering hate speech texts in order to parallelize sentences in multiple languages, examining the challenges of the task.

Korre, K., Muti, A., Barrón-Cedeño, A. (2024). The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration. ELRA and ICCL.

The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration

Katerina Korre;Arianna Muti;Alberto Barrón-Cedeño

2024

Abstract

Hate speech is infamously one of the most demanding topics in Natural Language Processing, as its multifacetedness is accompanied by a handful of challenges, such as multilinguality and cross-linguality. Hate speech has a subjective aspect that intensifies when referring to different cultures and different languages. In this respect, we design a pipeline that will help us explore the possibility of the creation of a parallel multilingual hate speech dataset, using machine translation. In this paper, we evaluate how/whether this is feasible by assessing the quality of the translations, calculating the toxicity levels of original and target texts, and calculating correlations between the newly obtained scores. Finally, we perform a qualitative analysis to gain further semantic and grammatical insights. With this pipeline we aim at exploring ways of filtering hate speech texts in order to parallelize sentences in multiple languages, examining the challenges of the task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo del volume
	
				Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
			
	Pagina iniziale
	
				15842
			
	Pagina finale
	
				15853
			
	Citazione
	
				Korre, K., Muti, A., Barrón-Cedeño, A. (2024). The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration. ELRA and ICCL.
			
	Tutti gli autori
	
						Korre, Katerina; Muti, Arianna; Barrón-Cedeño, Alberto
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2024.lrec-main.1376.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC) Dimensione 381.3 kB Formato Adobe PDF Visualizza/Apri	381.3 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/973180

Citazioni

ND

8

ND

ND

social impact