We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 2: Factuality. The task asked to assess whether a given check-worthy claim made by a politician in the context of a debate/speech is factually true, half-true, or false. In terms of data, we focused on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign (we also provided translations in Arabic), and we relied on comments and factuality judgments from factcheck.org and snopes.com, which we further refined manually. A total of 30 teams registered to participate in the lab, and five of them actually submitted runs. The most successful approaches used by the participants relied on the automatic retrieval of evidence from the Web. Similarities and other relationships between the claim and the retrieved documents were used as input to classifiers in order to make a decision. The best-performing official submissions achieved mean absolute error of .705 and .658 for the English and for the Arabic test sets, respectively. This leaves plenty of room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in fact-checking.

Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 2: Factuality

Barron-Cedeno A.;Da San Martino G.;
2018

Abstract

We present an overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims, with focus on Task 2: Factuality. The task asked to assess whether a given check-worthy claim made by a politician in the context of a debate/speech is factually true, half-true, or false. In terms of data, we focused on debates from the 2016 US Presidential Campaign, as well as on some speeches during and after the campaign (we also provided translations in Arabic), and we relied on comments and factuality judgments from factcheck.org and snopes.com, which we further refined manually. A total of 30 teams registered to participate in the lab, and five of them actually submitted runs. The most successful approaches used by the participants relied on the automatic retrieval of evidence from the Web. Similarities and other relationships between the claim and the retrieved documents were used as input to classifiers in order to make a decision. The best-performing official submissions achieved mean absolute error of .705 and .658 for the English and for the Arabic test sets, respectively. This leaves plenty of room for further improvement, and thus we release all datasets and the scoring scripts, which should enable further research in fact-checking.
CEUR Workshop Proceedings
1
13
Barron-Cedeno A.; Elsayed T.; Suwaileh R.; Marquez L.; Atanasova P.; Zaghouani W.; Kyuchukov S.; Da San Martino G.; Nakov P.
File in questo prodotto:
File Dimensione Formato  
invited_paper_14.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 337.51 kB
Formato Adobe PDF
337.51 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/709188
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? ND
social impact