This study examines whether the psycholinguistic and demographic characteristics of authors of online texts are correlated with the way harmful language, such as toxicity and hate speech, is judged. We apply artificial intelligence models to two harmful language datasets, Jigsaw’s Special Rater Pool dataset and the Measuring Hate Speech dataset, to generate probabilities for different text aspects, namely inferring demographic information of the author behind the suspicious text in terms of age and gender, as well as the expressed emotions, emotionality, sentiment and communication style. We then perform a statistical regression analysis to examine how these text aspects correlate with the perception of hate speech and toxicity during the annotation process. The study shows that while the frequency of the psycholinguistic text aspects that can be derived from the author’s personality does not differ significantly between harmful and non-harmful classes, the inferred text aspects are statistically associated with the annotators’ perception of harmful language and could potentially influence the way annotators label the texts.

Korre, A., Basile, A., Yenikent, S., Spallaccia, B., Franco-Salvador, M., Barrón-Cedeño, A. (2025). Examining Inferred Author and Textual Correlates of Harmful Language Annotation. LANGUAGE RESOURCES AND EVALUATION, ., 1-32.

Examining Inferred Author and Textual Correlates of Harmful Language Annotation

Aikaterini Korre;Beatrice Spallaccia;Alberto Barrón-Cedeño
2025

Abstract

This study examines whether the psycholinguistic and demographic characteristics of authors of online texts are correlated with the way harmful language, such as toxicity and hate speech, is judged. We apply artificial intelligence models to two harmful language datasets, Jigsaw’s Special Rater Pool dataset and the Measuring Hate Speech dataset, to generate probabilities for different text aspects, namely inferring demographic information of the author behind the suspicious text in terms of age and gender, as well as the expressed emotions, emotionality, sentiment and communication style. We then perform a statistical regression analysis to examine how these text aspects correlate with the perception of hate speech and toxicity during the annotation process. The study shows that while the frequency of the psycholinguistic text aspects that can be derived from the author’s personality does not differ significantly between harmful and non-harmful classes, the inferred text aspects are statistically associated with the annotators’ perception of harmful language and could potentially influence the way annotators label the texts.
2025
Korre, A., Basile, A., Yenikent, S., Spallaccia, B., Franco-Salvador, M., Barrón-Cedeño, A. (2025). Examining Inferred Author and Textual Correlates of Harmful Language Annotation. LANGUAGE RESOURCES AND EVALUATION, ., 1-32.
Korre, Aikaterini; Basile, Angelo; Yenikent, Seren; Spallaccia, Beatrice; Franco-Salvador, Marc; Barrón-Cedeño, Alberto
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1016508
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact