The evaluation of large language models for Italian faces unique challenges due to orphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.

Moroni, L., Pappacoda, G., Barba, E., Conia, S., Galassi, A., Magnini, B., et al. (2025). Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines. CEUR Workshop Proceedings.

Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines

Gianmarco Pappacoda;Andrea Galassi;Roberto Navigli;Paolo Torroni;
2025

Abstract

The evaluation of large language models for Italian faces unique challenges due to orphosyntactic complexity, dialectal variation, cultural-specific knowledge, and limited availability of computational resources. This position paper presents a comprehensive framework for Italian LLM benchmarking, in which we identify key dimensions for LLM evaluation, including linguistic capabilities, knowledge domains, task types and prompt variations, proposing high-level methodological guidelines for current and future initiatives. We advocate a community-driven, sustainable benchmarking initiative that incorporates dynamic dataset management, open model prioritization, and collaborative infrastructure utilization. Our framework aims to establish a coordinated effort within the Italian NLP community to ensure rigorous, scientifically sound evaluation practices that can adapt to the evolving landscape of Italian LLMs.
2025
Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025)
747
759
Moroni, L., Pappacoda, G., Barba, E., Conia, S., Galassi, A., Magnini, B., et al. (2025). Sustainable Italian LLM Evaluation: Community Perspectives and Methodological Guidelines. CEUR Workshop Proceedings.
Moroni, Luca; Pappacoda, Gianmarco; Barba, Edoardo; Conia, Simone; Galassi, Andrea; Magnini, Bernardo; Navigli, Roberto; Torroni, Paolo; Zanoli, Rober...espandi
File in questo prodotto:
File Dimensione Formato  
70_main_long.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.23 MB
Formato Adobe PDF
1.23 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1031375
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact