We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.
Galassi Andrea, D.K. (2020). Cross-lingual Annotation Projection in Legal Texts. International Committee on Computational Linguistics [10.18653/v1/2020.coling-main.79].
Cross-lingual Annotation Projection in Legal Texts
Galassi Andrea
;Torroni Paolo
2020
Abstract
We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.File | Dimensione | Formato | |
---|---|---|---|
Cross-lingual-Annotation-Projection-in-Legal-Texts.pdf
accesso aperto
Descrizione: Published paper
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
405.85 kB
Formato
Adobe PDF
|
405.85 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.