CRIS Current Research Information System

Some human preferences are universal. The odor of vanilla is perceived as pleasant all around the world. We expect neural models trained on human texts to exhibit these kind of preferences, i.e. biases, but we show that this is not always the case. We explore 16 static and contextual embedding models in 9 languages and, when possible, compare them under similar training conditions. We introduce and release CA-WEAT, multilingual cultural aware tests to quantify biases, and compare them to previous English-centric tests. Our experiments confirm that monolingual static embeddings do exhibit human biases, but values differ across languages, being far from universal. Biases are less evident in contextual models, to the point that the original human association might be reversed. Multilinguality proves to be another variable that attenuates and even reverses the effect of the bias, specially in contextual multilingual models. In order to explain this variance among models and languages, we examine the effect of asymmetries in the training corpus, departures from isomorphism in multilingual embedding spaces and discrepancies in the testing measures between languages.

España-Bonet, C., Barrón-Cedeño, A. (2022). The (Undesired) Attenuation of Human Biases by Multilinguality. Toronto : Association for Computational Linguistics [10.18653/v1/2022.emnlp-main.133].

The (Undesired) Attenuation of Human Biases by Multilinguality

España-Bonet, Cristina^Primo;Barrón-Cedeño, Alberto^Secondo

2022

Abstract

Some human preferences are universal. The odor of vanilla is perceived as pleasant all around the world. We expect neural models trained on human texts to exhibit these kind of preferences, i.e. biases, but we show that this is not always the case. We explore 16 static and contextual embedding models in 9 languages and, when possible, compare them under similar training conditions. We introduce and release CA-WEAT, multilingual cultural aware tests to quantify biases, and compare them to previous English-centric tests. Our experiments confirm that monolingual static embeddings do exhibit human biases, but values differ across languages, being far from universal. Biases are less evident in contextual models, to the point that the original human association might be reversed. Multilinguality proves to be another variable that attenuates and even reverses the effect of the bias, specially in contextual multilingual models. In order to explain this variance among models and languages, we examine the effect of asymmetries in the training corpus, departures from isomorphism in multilingual embedding spaces and discrepancies in the testing measures between languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Titolo del volume
	
				Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
			
	Pagina iniziale
	
				2056
			
	Pagina finale
	
				2077
			
	Codice DOI
	
				https://dx.doi.org/10.18653/v1/2022.emnlp-main.133
			
	Citazione
	
				España-Bonet, C., Barrón-Cedeño, A. (2022). The (Undesired) Attenuation of Human Biases by Multilinguality. Toronto : Association for Computational Linguistics [10.18653/v1/2022.emnlp-main.133].
			
	Tutti gli autori
	
						España-Bonet, Cristina; Barrón-Cedeño, Alberto
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2022.emnlp-main.133.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 2.11 MB Formato Adobe PDF Visualizza/Apri	2.11 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/946516

Citazioni

ND

6

ND

social impact