CRIS Current Research Information System

Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union. Our evaluation with human experts confirms that our method is more robust than the alternatives.

Sezen Perçin, A.G. (2022). Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts. Association for Computational Lingustics [10.18653/v1/2022.nllp-1.4].

Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts

Sezen Perçin^Primo;Andrea Galassi;Francesca Lagioia;Federico Ruggeri;Piera Santin;Giovanni Sartor;Paolo Torroni

2022

Abstract

Creating balanced labeled textual corpora for complex tasks, like legal analysis, is a challenging and expensive process that often requires the collaboration of domain experts. To address this problem, we propose a data augmentation method based on the combination of GloVe word embeddings and the WordNet ontology. We present an example of application in the legal domain, specifically on decisions of the Court of Justice of the European Union. Our evaluation with human experts confirms that our method is more robust than the alternatives.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				Proceedings of the Natural Legal Language Processing Workshop 2022
			
	Pagina iniziale
	
				47
			
	Pagina finale
	
				52
			
	Codice DOI
	
				https://dx.doi.org/10.18653/v1/2022.nllp-1.4
			
	Citazione
	
				Sezen Perçin, A.G. (2022). Combining WordNet and Word Embeddings in Data Augmentation for Legal Texts. Association for Computational Lingustics [10.18653/v1/2022.nllp-1.4].
			
	Tutti gli autori
	
						Sezen Perçin, Andrea Galassi, Francesca Lagioia, Federico Ruggeri, Piera Santin, Giovanni Sartor, Paolo Torroni
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_NLLP2022__Data_augmentation_for_Maxims.pdf accesso aperto Tipo: Preprint / submitted version - versione proposta prima della peer-review Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 158.75 kB Formato Adobe PDF Visualizza/Apri	158.75 kB	Adobe PDF	Visualizza/Apri
Combining-WordNet-and-Word-Embeddings-in-Data-Augmentation-for-Legal-Texts.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 159.21 kB Formato Adobe PDF Visualizza/Apri	159.21 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/905768

Citazioni

ND

10

ND

6

social impact