We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.
Cross-lingual Annotation Projection in Legal Texts / Galassi Andrea, Drazewski Kasper, Lippi Marco , Torroni Paolo. - ELETTRONICO. - (2020), pp. 915-926. (Intervento presentato al convegno The 28th International Conference on Computational Linguistics tenutosi a arcelona, Spain (Online) nel December 8-13, 2020) [10.18653/v1/2020.coling-main.79].
Cross-lingual Annotation Projection in Legal Texts
Galassi Andrea
;Torroni Paolo
2020
Abstract
We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.File | Dimensione | Formato | |
---|---|---|---|
Cross-lingual-Annotation-Projection-in-Legal-Texts.pdf
accesso aperto
Descrizione: Published paper
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
405.85 kB
Formato
Adobe PDF
|
405.85 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.