TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

Baldelli, D.; Jiang, J.; Aizawa, A.; Torroni, P.

doi:10.1007/978-3-031-56027-9_29

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code (Code: https://github.com/Dundalia/TWOLAR; Models and Dataset: https://huggingface.co/Dundalia).

Baldelli, D., Jiang, J., Aizawa, A., Torroni, P. (2024). TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-56027-9_29].

TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking

Baldelli D.;Jiang J.;Aizawa A.;Torroni P.

2024

Abstract

In this paper, we present TWOLAR: a two-stage pipeline for passage reranking based on the distillation of knowledge from Large Language Models (LLM). TWOLAR introduces a new scoring strategy and a distillation process consisting in the creation of a novel and diverse training dataset. The dataset consists of 20K queries, each associated with a set of documents retrieved via four distinct retrieval methods to ensure diversity, and then reranked by exploiting the zero-shot reranking capabilities of an LLM. Our ablation studies demonstrate the contribution of each new component we introduced. Our experimental results show that TWOLAR significantly enhances the document reranking ability of the underlying model, matching and in some cases even outperforming state-of-the-art models with three orders of magnitude more parameters on the TREC-DL test sets and the zero-shot evaluation benchmark BEIR. To facilitate future work we release our data set, finetuned models, and code (Code: https://github.com/Dundalia/TWOLAR; Models and Dataset: https://huggingface.co/Dundalia).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				470
			
	Pagina finale
	
				485
			
	Collana/Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-031-56027-9_29
			
	Citazione
	
				Baldelli, D., Jiang, J., Aizawa, A., Torroni, P. (2024). TWOLAR: A TWO-Step LLM-Augmented Distillation Method for Passage Reranking. Springer Science and Business Media Deutschland GmbH [10.1007/978-3-031-56027-9_29].
			
	Tutti gli autori
	
						Baldelli, D.; Jiang, J.; Aizawa, A.; Torroni, P.

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1002309

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

6

ND

CRIS Current Research Information System