CRIS Current Research Information System

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

Leveraging online user feedback to improve statistical machine translation / Formiga, Lluís and Barrón-Cedeño, Alberto and Màrquez, Lluís and Henríquez, C.A. and Mariño, J.B.. - In: THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH. - ISSN 1076-9757. - ELETTRONICO. - 54:(2015), pp. 159-192. [10.1613/jair.4716]

Leveraging online user feedback to improve statistical machine translation

Formiga, Lluís;Barrón-Cedeño, Alberto;Màrquez, Lluís;Henríquez, C. A.;Mariño, J. B.

2015

Abstract

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2015
		
	Rivista
	
			THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
		
	Codice DOI
	
			https://dx.doi.org/10.1613/jair.4716
		
	Citazione
	
			Leveraging online user feedback to improve statistical machine translation / Formiga, Lluís and Barrón-Cedeño, Alberto and Màrquez, Lluís and Henríquez, C.A. and Mariño, J.B.. - In: THE JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH. - ISSN 1076-9757. - ELETTRONICO. - 54:(2015), pp. 159-192. [10.1613/jair.4716]
		
	Tutti gli autori
	
			Formiga, Lluís and Barrón-Cedeño, Alberto and Màrquez, Lluís and Henríquez, C.A. and Mariño, J.B.
		
	Appare nelle tipologie:
	
			1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
10961-Article Text-20448-1-10-20180216.pdf accesso aperto Tipo: Versione (PDF) editoriale Licenza: Licenza per Accesso Aperto. Altra tipologia di licenza compatibile con Open Access Dimensione 530.78 kB Formato Adobe PDF Visualizza/Apri	530.78 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/707800

Citazioni

ND

3

0

social impact