CRIS Current Research Information System

We are recently witnessing a radical shift towards digitisation in many aspects of our daily life, including law, public administration and governance. This has sometimes been done with the aim of reducing costs and human errors by improving data analysis and management, but not without raising major technological challenges. One of these challenges is certainly the need to cope with relatively small amounts of data, without sacrificing performance. Indeed, cutting-edge approaches to (natural) language processing and understanding are often data-hungry, especially those based on deep learning. With this paper we seek to address the problem of data scarcity in automatic Legalese (or legal English) processing and understanding. What we propose is an ensemble of shallow and deep learning techniques called SyntagmTuner, designed to combine the accuracy of deep learning with the ability of shallow learning to work with little data. Our contribution is based on the assumption that Legalese differs from its spoken language in the way the meaning is encoded by the structure of the text and the co-occurrence of words. As result, we show with SyntagmTuner how we can perform important tasks for e-governance, as multi-label classification of the United Nations General Assembly (UNGA) Resolutions or legal question answering, with data-sets of roughly 100 samples or even less.

Sovrano, F., Palmirani, M., Vitali, F. (2022). Combining shallow and deep learning approaches against data scarcity in legal domains. GOVERNMENT INFORMATION QUARTERLY, 39(3), 1-13 [10.1016/j.giq.2022.101715].

Combining shallow and deep learning approaches against data scarcity in legal domains

Sovrano F.^Primo;Palmirani M.;Vitali F.

2022

Abstract

We are recently witnessing a radical shift towards digitisation in many aspects of our daily life, including law, public administration and governance. This has sometimes been done with the aim of reducing costs and human errors by improving data analysis and management, but not without raising major technological challenges. One of these challenges is certainly the need to cope with relatively small amounts of data, without sacrificing performance. Indeed, cutting-edge approaches to (natural) language processing and understanding are often data-hungry, especially those based on deep learning. With this paper we seek to address the problem of data scarcity in automatic Legalese (or legal English) processing and understanding. What we propose is an ensemble of shallow and deep learning techniques called SyntagmTuner, designed to combine the accuracy of deep learning with the ability of shallow learning to work with little data. Our contribution is based on the assumption that Legalese differs from its spoken language in the way the meaning is encoded by the structure of the text and the co-occurrence of words. As result, we show with SyntagmTuner how we can perform important tasks for e-governance, as multi-label classification of the United Nations General Assembly (UNGA) Resolutions or legal question answering, with data-sets of roughly 100 samples or even less.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Rivista
	
				GOVERNMENT INFORMATION QUARTERLY
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.giq.2022.101715
			
	Citazione
	
				Sovrano, F., Palmirani, M., Vitali, F. (2022). Combining shallow and deep learning approaches against data scarcity in legal domains. GOVERNMENT INFORMATION QUARTERLY, 39(3), 1-13 [10.1016/j.giq.2022.101715].
			
	Tutti gli autori
	
						Sovrano, F.; Palmirani, M.; Vitali, F.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
manuscript.pdf Open Access dal 01/08/2024 Tipo: Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 773.91 kB Formato Adobe PDF Visualizza/Apri	773.91 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/889786

Citazioni

ND

9

7

9

social impact