In recent years, hate speech detection models have achieved significantly improved results, largely due to advances in Large Language Models (LLMs). As a result, research has increasingly focused on more nuanced phenomena, such as the detection of implicit hate and stereotypes. Although the challenge of identifying implicit language has been largely explored, it remains an open issue for state-of-the-art models due to their limited ability to grasp contextual and culturally specific knowledge. In this work, we address the task of identifying stereotypes implicitly encoded in hate speech messages, and propose a method for generating them by leveraging the combined potential of LLMs and Knowledge Graphs (KGs). As a first step, we designed an ontology specifically tailored to represent implicit hate speech. We then populated the ontology using a subset of an Italian-language hate speech dataset, in which targets and implied stereotype statements were manually annotated. The remaining portion of the dataset was reserved as a test set to evaluate the impact of knowledge graph-derived information on LLM-generated stereotypes. For each input sentence, relevant knowledge was extracted from the ontology using SPARQL queries and used to enrich the prompt provided to various LLMs. We compared the results of the knowledge-enhanced approach against those of a baseline few-shot learning approach. Evaluation was conducted using BLEU, BERTScore and ROUGE metrics. Additionally, given the high subjectivity of the task, we performed a manual qualitative analysis on a subset of the model outputs to assess both the quality of the evaluation and the soundness of the generated stereotypes.
Cuccarini, M., Draetta, L., Fiumanò, B., Bistarelli, S., Damiano, R., Presutti, V. (2025). Unveiling Stereotypes: Combining Knowledge Graphs and LLMs for Implied Stereotype Generation.
Unveiling Stereotypes: Combining Knowledge Graphs and LLMs for Implied Stereotype Generation
Beatrice Fiumanò
;Valentina Presutti
2025
Abstract
In recent years, hate speech detection models have achieved significantly improved results, largely due to advances in Large Language Models (LLMs). As a result, research has increasingly focused on more nuanced phenomena, such as the detection of implicit hate and stereotypes. Although the challenge of identifying implicit language has been largely explored, it remains an open issue for state-of-the-art models due to their limited ability to grasp contextual and culturally specific knowledge. In this work, we address the task of identifying stereotypes implicitly encoded in hate speech messages, and propose a method for generating them by leveraging the combined potential of LLMs and Knowledge Graphs (KGs). As a first step, we designed an ontology specifically tailored to represent implicit hate speech. We then populated the ontology using a subset of an Italian-language hate speech dataset, in which targets and implied stereotype statements were manually annotated. The remaining portion of the dataset was reserved as a test set to evaluate the impact of knowledge graph-derived information on LLM-generated stereotypes. For each input sentence, relevant knowledge was extracted from the ontology using SPARQL queries and used to enrich the prompt provided to various LLMs. We compared the results of the knowledge-enhanced approach against those of a baseline few-shot learning approach. Evaluation was conducted using BLEU, BERTScore and ROUGE metrics. Additionally, given the high subjectivity of the task, we performed a manual qualitative analysis on a subset of the model outputs to assess both the quality of the evaluation and the soundness of the generated stereotypes.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


