Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union. Our evaluation with human experts confirms that our method is more robust than the alternatives.
Sezen Perçin, A.G. (2022). Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts. Association for Computational Lingustics.
Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts
Andrea Galassi
;Francesca Lagioia;Federico Ruggeri;Piera Santin;Giovanni Sartor;Paolo Torroni
2022
Abstract
Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union. Our evaluation with human experts confirms that our method is more robust than the alternatives.File | Dimensione | Formato | |
---|---|---|---|
_NLLP2022__Data_augmentation_for_Maxims.pdf
accesso aperto
Tipo:
Preprint
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
158.75 kB
Formato
Adobe PDF
|
158.75 kB | Adobe PDF | Visualizza/Apri |
Combining-WordNet-and-Word-Embeddings-in-Data-Augmentation-for-Legal-Texts.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
159.21 kB
Formato
Adobe PDF
|
159.21 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.