This paper describes an approach to translating course unit descriptions from Italian and German into English, using a phrase-based machine translation (MT) system. The genre is very prominent among those requiring translation by universities in European countries in which English is a non-native language. For each language combination, an in-domain bilingual corpus including course unit and degree program descriptions is used to train an MT engine, whose output is then compared to a baseline engine trained on the Europarl corpus. In a subsequent experiment, a bilingual terminology database is added to the training sets in both engines and its impact on the output quality is evaluated based on BLEU and postediting score. Results suggest that the use of domain-specific corpora boosts the engines quality for both language combinations, especially for German-English, whereas adding terminological resources does not seem to bring notable benefits
Scansani, R. (2017). Enhancing Machine Translation of Academic Course Catalogues with Terminological Resources. Shoumen : Association for Computational Linguistics [10.26615/978-954-452-042-7_001].
Enhancing Machine Translation of Academic Course Catalogues with Terminological Resources
SCANSANI, RANDY
;Silvia Bernardini
;Adriano Ferraresi
;Federico Gaspari
;Marcello Soffritti
2017
Abstract
This paper describes an approach to translating course unit descriptions from Italian and German into English, using a phrase-based machine translation (MT) system. The genre is very prominent among those requiring translation by universities in European countries in which English is a non-native language. For each language combination, an in-domain bilingual corpus including course unit and degree program descriptions is used to train an MT engine, whose output is then compared to a baseline engine trained on the Europarl corpus. In a subsequent experiment, a bilingual terminology database is added to the training sets in both engines and its impact on the output quality is evaluated based on BLEU and postediting score. Results suggest that the use of domain-specific corpora boosts the engines quality for both language combinations, especially for German-English, whereas adding terminological resources does not seem to bring notable benefitsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.