Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks , the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and What-sApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration.

Nonostante si osservi un cre-scente interesse per approcci che identi-fichino il linguaggio offensivo sui social network attraverso l'NLP, la necessità di sviluppare sistemi che mantengano una buona performance anche su piattaforme diverseè ancora un tema di ricerca aper-to. In questo contributo presentiamo una valutazione comparativa su dataset per l'identificazione di linguaggio d'odio pro-venienti da quattro diverse piattaforme: Facebook, Twitter, Instagram and Wha-tsApp. Lo studio dimostra che, combinan-do dataset diversi per aumentare i dati di training, migliora le performance di clas-sificazione, anche se l'impatto varia a se-conda della piattaforma considerata.

Cross-Platform Evaluation for Italian Hate Speech Detection

Michele Corazza;
2019

Abstract

Despite the number of approaches recently proposed in NLP for detecting abusive language on social networks , the issue of developing hate speech detection systems that are robust across different platforms is still an unsolved problem. In this paper we perform a comparative evaluation on datasets for hate speech detection in Italian, extracted from four different social media platforms, i.e. Facebook, Twitter, Instagram and What-sApp. We show that combining such platform-dependent datasets to take advantage of training data developed for other platforms is beneficial, although their impact varies depending on the social network under consideration.
2019
CLiC-it 2019-Italian Conference on Computational Linguistics
1
7
Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli, Serena Villata
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/801567
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact