CRIS Current Research Information System

Privacy policies often fall short of providing a comprehensive account of how personal data is used, thus failing to comply with GDPR requirements. By doing so, they hamper the users’ ability to make informed decisions about using services while ensuring that their data is used properly and fairly. This calls for automatic tools that can effectively identify potentially unlawful policies. Here we present a new corpus of Italian privacy policies, with clauses labelled by experts in data protection law, to indicate the level of comprehensiveness of information. We focus on the categories of data processed, classifying each clause as either sufficiently or insufficiently informative (“vague”). We perform 6 different classification and detection tasks, comparing the performance of BERT-based models and generative Large Language Models. Addressing multilingualism is crucial in the EU, whose 24 spoken languages are an integral part of its cultural heritage. Consequentely, we also perform cross-language experiments to evaluate whether a pre-existing English corpus or classifiers can be leveraged for Italian and, vice versa, whether our corpus is informative enough to generalize to other languages.

Grundler, G., Musicco, M., Galassi, A., Lagioia, F., Liepina, R., Resta, G., et al. (2025). Detecting Vague Clauses in Italian Privacy Policies Using Transformers, LLMs, and Cross-Lingual Techniques [10.3233/FAIA251362].

Detecting Vague Clauses in Italian Privacy Policies Using Transformers, LLMs, and Cross-Lingual Techniques

Giulia Grundler;Mariaceleste Musicco;Andrea Galassi;Francesca Lagioia;Ruta Liepina;Giorgio Resta;Sara Roccu;Giovanni Sartor;Paolo Torroni

2025

Abstract

Privacy policies often fall short of providing a comprehensive account of how personal data is used, thus failing to comply with GDPR requirements. By doing so, they hamper the users’ ability to make informed decisions about using services while ensuring that their data is used properly and fairly. This calls for automatic tools that can effectively identify potentially unlawful policies. Here we present a new corpus of Italian privacy policies, with clauses labelled by experts in data protection law, to indicate the level of comprehensiveness of information. We focus on the categories of data processed, classifying each clause as either sufficiently or insufficiently informative (“vague”). We perform 6 different classification and detection tasks, comparing the performance of BERT-based models and generative Large Language Models. Addressing multilingualism is crucial in the EU, whose 24 spoken languages are an integral part of its cultural heritage. Consequentely, we also perform cross-language experiments to evaluate whether a pre-existing English corpus or classifiers can be leveraged for Italian and, vice versa, whether our corpus is informative enough to generalize to other languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Frontiers in Artificial Intelligence and Applications: ECAI 2025
			
	Pagina iniziale
	
				4594
			
	Pagina finale
	
				4602
			
	Collana/Serie
	
				FRONTIERS IN ARTIFICIAL INTELLIGENCE AND APPLICATIONS
			
	Codice DOI
	
				https://dx.doi.org/10.3233/FAIA251362
			
	Citazione
	
				Grundler, G., Musicco, M., Galassi, A., Lagioia, F., Liepina, R., Resta, G., et al. (2025). Detecting Vague Clauses in Italian Privacy Policies Using Transformers, LLMs, and Cross-Lingual Techniques [10.3233/FAIA251362].
			
	Tutti gli autori
	
						Grundler, Giulia; Musicco, Mariaceleste; Galassi, Andrea; Lagioia, Francesca; Liepina, Ruta; Resta, Giorgio; Roccu, Sara; Sartor, Giovanni; Torroni, P...espandi
						
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
FAIA-413-FAIA251362.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC) Dimensione 311.99 kB Formato Adobe PDF Visualizza/Apri	311.99 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1027194

Citazioni

ND

0

ND

0

social impact