CRIS Current Research Information System

In this contribution we describe an approach to evaluate the use of terminology in a phrase-based machine translation system to translate course unit descriptions from Italian into English. The genre is very prominent among those requiring translation by universities in European countries where English is not a native language. Two MT engines are trained on an in-domain bilingual corpus and a subset of the Europarl corpus, and one of them is enhanced adding a bilingual termbase to its training data. Overall systems’ performance is assessed through the BLEU score, whereas the f-score is used to focus the evaluation on term translation. Furthermore, a manual analysis of the terms is carried out. Results suggest that in some cases - despite the simplistic approach implemented to inject terms into the MT system - the termbase was able to bias the word choice of the engine.

Randy Scansani, M.F. (2017). Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogue Translation. CEUR-WS.org.

Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogue Translation

Randy Scansani;Marcello Federico;Luisa Bentivogli

2017

Abstract

In this contribution we describe an approach to evaluate the use of terminology in a phrase-based machine translation system to translate course unit descriptions from Italian into English. The genre is very prominent among those requiring translation by universities in European countries where English is not a native language. Two MT engines are trained on an in-domain bilingual corpus and a subset of the Europarl corpus, and one of them is enhanced adding a bilingual termbase to its training data. Overall systems’ performance is assessed through the BLEU score, whereas the f-score is used to focus the evaluation on term translation. Furthermore, a manual analysis of the terms is carried out. Results suggest that in some cases - despite the simplistic approach implemented to inject terms into the MT system - the termbase was able to bias the word choice of the engine.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Titolo del volume
	
				Proceedings of the Fourth Italian Conference on Computational Linguistics (CLiC-it 2017), Rome, Italy, December 11-13, 2017
			
	Pagina iniziale
	
				1
			
	Pagina finale
	
				6
			
	Collana/Serie
	
				CEUR WORKSHOP PROCEEDINGS
			
	Citazione
	
				Randy Scansani, M.F. (2017). Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogue Translation. CEUR-WS.org.
			
	Tutti gli autori
	
						Randy Scansani, Marcello Federico, Luisa Bentivogli
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/622619

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

0

ND

social impact