CRIS Current Research Information System

Legal question answering (LQA) relies on supervised methods to automatically handle law-related queries. These solutions require a substantial amount of carefully annotated data for training, which makes the process very costly. Although large language models (LLMs) show promise in zero-shot QA, their computational demands limit their practical use, making specialized small language models (SLMs) more favorable. Furthermore, the growing interest in synthetic data generation has recently surged, spurred by the impressive generation capabilities of LLMs. This paper presents ACE-ATTORNEY, an LLM distillation approach devised to develop LQA data and supervised models without human annotation. Given a textual prompt, a frozen LLM generates artificial examples that are used as knowledge to train a student SLM with an order of magnitude fewer parameters. Taking into account a realistic retrieval-based scenario to fetch the correct document for answer generation, we propose Selective Generative Paradigm, a novel approach designed to improve retrieval efficacy. Extensive experiments demonstrate the effectiveness and efficiency of distilled models on SYN-LEQA, our human-free synthetic dataset, and a public expert-annotated corpus. Notably, by using only a few dozen training samples, our best SLM achieves LLM-comparable performance with 1200% less CO2 emissions. The data and the code to fully reproduce our results are available at https://github.com/disi-unibo-nlp/ace-attorney.

Italiani, P., Moro, G., Ragazzi, L. (2025). Enhancing Legal Question Answering with Data Generation and Knowledge Distillation from Large Language Models. ARTIFICIAL INTELLIGENCE AND LAW, Proceedings of the 31st International Conference on Computational Linguistics, 1-26 [10.1007/s10506-025-09463-9].

Enhancing Legal Question Answering with Data Generation and Knowledge Distillation from Large Language Models

paolo italiani^Co-primo;gianluca moro^Co-primo;luca ragazzi^Co-primo

2025

Abstract

Legal question answering (LQA) relies on supervised methods to automatically handle law-related queries. These solutions require a substantial amount of carefully annotated data for training, which makes the process very costly. Although large language models (LLMs) show promise in zero-shot QA, their computational demands limit their practical use, making specialized small language models (SLMs) more favorable. Furthermore, the growing interest in synthetic data generation has recently surged, spurred by the impressive generation capabilities of LLMs. This paper presents ACE-ATTORNEY, an LLM distillation approach devised to develop LQA data and supervised models without human annotation. Given a textual prompt, a frozen LLM generates artificial examples that are used as knowledge to train a student SLM with an order of magnitude fewer parameters. Taking into account a realistic retrieval-based scenario to fetch the correct document for answer generation, we propose Selective Generative Paradigm, a novel approach designed to improve retrieval efficacy. Extensive experiments demonstrate the effectiveness and efficiency of distilled models on SYN-LEQA, our human-free synthetic dataset, and a public expert-annotated corpus. Notably, by using only a few dozen training samples, our best SLM achieves LLM-comparable performance with 1200% less CO2 emissions. The data and the code to fully reproduce our results are available at https://github.com/disi-unibo-nlp/ace-attorney.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				ARTIFICIAL INTELLIGENCE AND LAW
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s10506-025-09463-9
			
	Citazione
	
				Italiani, P., Moro, G., Ragazzi, L. (2025). Enhancing Legal Question Answering with Data Generation and Knowledge Distillation from Large Language Models. ARTIFICIAL INTELLIGENCE AND LAW, Proceedings of the 31st International Conference on Computational Linguistics, 1-26 [10.1007/s10506-025-09463-9].
			
	Tutti gli autori
	
						Italiani, Paolo; Moro, Gianluca; Ragazzi, Luca
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s10506-025-09463-9.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.63 MB Formato Adobe PDF Visualizza/Apri	1.63 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1010488

Citazioni

ND

2

2

social impact