CRIS Current Research Information System

Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of in creasing complexity. We evaluate three state of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning prob lems, they consistently fail in cases of abductive reasoning. Moreover, we observe that enhancing LLMs with rationale generation is not always beneficial. Nonetheless, we find that generated rationales are semantically similar to those provided by humans, especially in deduc tive reasoning cases.

Dougrez-Lewis, J., Elahi Akhter, M., Ruggeri, F., Löbbers, S., He, Y., Liakata, M. (2025). Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification [10.18653/v1/2025.findings-acl.1059].

Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification

Federico Ruggeri^Secondo;Sebastian Löbbers^Penultimo;Yulan He^Co-ultimo;Maria Liakata^Co-ultimo

2025

Abstract

Although LLMs have shown great performance on Mathematics and Coding related reasoning tasks, the reasoning capabilities of LLMs regarding other forms of reasoning are still an open problem. Here, we examine the issue of reasoning from the perspective of claim verification. We propose a framework designed to break down any claim paired with evidence into atomic reasoning types that are necessary for verification. We use this framework to create RECV, the first claim verification benchmark, incorporating real-world claims, to assess the deductive and abductive reasoning capabilities of LLMs. The benchmark comprises of three datasets, covering reasoning problems of in creasing complexity. We evaluate three state of-the-art proprietary LLMs under multiple prompt settings. Our results show that while LLMs can address deductive reasoning prob lems, they consistently fail in cases of abductive reasoning. Moreover, we observe that enhancing LLMs with rationale generation is not always beneficial. Nonetheless, we find that generated rationales are semantically similar to those provided by humans, especially in deduc tive reasoning cases.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Findings of the Association for Computational Linguistics: ACL 2025
			
	Pagina iniziale
	
				20604
			
	Pagina finale
	
				20628
			
	Codice DOI
	
				https://dx.doi.org/10.18653/v1/2025.findings-acl.1059
			
	Citazione
	
				Dougrez-Lewis, J., Elahi Akhter, M., Ruggeri, F., Löbbers, S., He, Y., Liakata, M. (2025). Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification [10.18653/v1/2025.findings-acl.1059].
			
	Tutti gli autori
	
						Dougrez-Lewis, John; Elahi Akhter, Mahmud; Ruggeri, Federico; Löbbers, Sebastian; He, Yulan; Liakata, Maria
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2025.findings-acl.1059.pdf accesso aperto Descrizione: paper Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.12 MB Formato Adobe PDF Visualizza/Apri	1.12 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1023571

Citazioni

ND

ND

ND

social impact