CRIS Current Research Information System

Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-attention in Transformers, where scores are calculated for every pair of words in a sentence. However, unlike Transformer attention, biaffine scoring does not use normalization prior to taking the softmax of the scores. In this paper, we provide theoretical evidence and empirical results revealing that a lack of normalization necessarily results in overparameterized parser models, where the extra parameters compensate for the sharp softmax outputs produced by high variance inputs to the biaffine scoring function. We argue that biaffine scoring can be made substantially more efficient by performing score normalization. We conduct experiments on semantic and syntactic dependency parsing in multiple languages, along with latent graph inference on non-linguistic data, using various settings of a k-hop parser. We train N-layer stacked BiLSTMs and evaluate the parser's performance with and without normalizing biaffine scores. Normalizing allows us to achieve state-of-the-art performance with fewer samples and trainable parameters. Code: https://github.com/paolo-gajo/EfficientSDP

Gajo, P., Rosati, D., Sajjad, H., Barrón-Cedeño, A. (2025). Dependency Parsing is More Parameter-Efficient with Normalization. Curran Associates, Inc..

Dependency Parsing is More Parameter-Efficient with Normalization

Gajo, Paolo^Primo;Rosati, Domenic;Sajjad, Hassan;Barrón-Cedeño, Alberto^Ultimo

2025

Abstract

Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-attention in Transformers, where scores are calculated for every pair of words in a sentence. However, unlike Transformer attention, biaffine scoring does not use normalization prior to taking the softmax of the scores. In this paper, we provide theoretical evidence and empirical results revealing that a lack of normalization necessarily results in overparameterized parser models, where the extra parameters compensate for the sharp softmax outputs produced by high variance inputs to the biaffine scoring function. We argue that biaffine scoring can be made substantially more efficient by performing score normalization. We conduct experiments on semantic and syntactic dependency parsing in multiple languages, along with latent graph inference on non-linguistic data, using various settings of a k-hop parser. We train N-layer stacked BiLSTMs and evaluate the parser's performance with and without normalizing biaffine scores. Normalizing allows us to achieve state-of-the-art performance with fewer samples and trainable parameters. Code: https://github.com/paolo-gajo/EfficientSDP

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Titolo del volume
	
				Advances in Neural Information Processing Systems 38
			
	Pagina iniziale
	
				143266
			
	Pagina finale
	
				143296
			
	Collana/Serie
	
				ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
			
	Citazione
	
				Gajo, P., Rosati, D., Sajjad, H., Barrón-Cedeño, A. (2025). Dependency Parsing is More Parameter-Efficient with Normalization. Curran Associates, Inc..
			
	Tutti gli autori
	
						Gajo, Paolo; Rosati, Domenic; Sajjad, Hassan; Barrón-Cedeño, Alberto
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
NeurIPS-2025-dependency-parsing-is-more-parameter-efficient-with-normalization-Paper-Conference.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale (CCBYNC) Dimensione 766.13 kB Formato Adobe PDF Visualizza/Apri	766.13 kB	Adobe PDF	Visualizza/Apri
NeurIPS-2025-dependency-parsing-is-more-parameter-efficient-with-normalization-Supplemental-Conference(1).zip accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 64.76 kB Formato Zip File Visualizza/Apri	64.76 kB	Zip File	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1061673

Citazioni

ND

ND

ND

ND

social impact