Most of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.

Unfair clause detection in terms of service across multiple languages / Galassi, Andrea; Lagioia, Francesca; Jabłonowska, Agnieszka; Lippi, Marco. - In: ARTIFICIAL INTELLIGENCE AND LAW. - ISSN 0924-8463. - ELETTRONICO. - in press:(2024), pp. 1-49. [10.1007/s10506-024-09398-7]

Unfair clause detection in terms of service across multiple languages

Galassi, Andrea
Co-primo
;
Lagioia, Francesca
Co-primo
;
2024

Abstract

Most of the existing natural language processing systems for legal texts are developed for the English language. Nevertheless, there are several application domains where multiple versions of the same documents are provided in different languages, especially inside the European Union. One notable example is given by Terms of Service (ToS). In this paper, we compare different approaches to the task of detecting potential unfair clauses in ToS across multiple languages. In particular, after developing an annotated corpus and a machine learning classifier for English, we consider and compare several strategies to extend the system to other languages: building a novel corpus and training a novel machine learning system for each language, from scratch; projecting annotations across documents in different languages, to avoid the creation of novel corpora; translating training documents while keeping the original annotations; translating queries at prediction time and relying on the English system only. An extended experimental evaluation conducted on a large, original dataset indicates that the time-consuming task of re-building a novel annotated corpus for each language can often be avoided with no significant degradation in terms of performance.
2024
Unfair clause detection in terms of service across multiple languages / Galassi, Andrea; Lagioia, Francesca; Jabłonowska, Agnieszka; Lippi, Marco. - In: ARTIFICIAL INTELLIGENCE AND LAW. - ISSN 0924-8463. - ELETTRONICO. - in press:(2024), pp. 1-49. [10.1007/s10506-024-09398-7]
Galassi, Andrea; Lagioia, Francesca; Jabłonowska, Agnieszka; Lippi, Marco
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/967073
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact