Text retrieval systems have become essential in the field of natural language processing (NLP), serving as the backbone for applications such as search engines, document indexing, and information retrieval. With the rise of generative AI, particularly Retrieval-Augmented Generation (RAG) systems, the demand for robust text retrieval models has increased. However, existing large language models (LLMs) and datasets are often insufficiently optimized for Italian, limiting their performance in Italian text retrieval tasks. This paper addresses this gap by proposing both a data collection and specialized models tailored for Italian text retrieval. Through extensive experimentation, we analyze the improvements and limitations in retrieval performance, paving the way for more effective Italian NLP applications.

Noviello, Y., Tamburini, F. (2024). Exploring Text-Embedding Retrieval Models for the Italian Language. Aachen : CEUR Workshop Proceedings (CEUR-WS.org).

Exploring Text-Embedding Retrieval Models for the Italian Language

Noviello Y.;Tamburini F.
2024

Abstract

Text retrieval systems have become essential in the field of natural language processing (NLP), serving as the backbone for applications such as search engines, document indexing, and information retrieval. With the rise of generative AI, particularly Retrieval-Augmented Generation (RAG) systems, the demand for robust text retrieval models has increased. However, existing large language models (LLMs) and datasets are often insufficiently optimized for Italian, limiting their performance in Italian text retrieval tasks. This paper addresses this gap by proposing both a data collection and specialized models tailored for Italian text retrieval. Through extensive experimentation, we analyze the improvements and limitations in retrieval performance, paving the way for more effective Italian NLP applications.
2024
Proceedings of the Tenth Italian Conference on Computational Linguistics (CLiC-it 2024)
1
6
Noviello, Y., Tamburini, F. (2024). Exploring Text-Embedding Retrieval Models for the Italian Language. Aachen : CEUR Workshop Proceedings (CEUR-WS.org).
Noviello, Y.; Tamburini, F.
File in questo prodotto:
File Dimensione Formato  
72_main_long.pdf

accesso aperto

Descrizione: Contributo in Atti di Convegno
Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 986.28 kB
Formato Adobe PDF
986.28 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1000663
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact