This chapter describes translation-relevant types of corpora and the main ways in which they can be used to (learn to) translate and to study translation. Virtually all professional translators nowadays, and also most non-professional translators and students of translation, are familiar with translation memories (TMs). These resources, that lie at the core of computer-assisted translation (CAT) tools, consist of databases of aligned source text (ST) and target text (TT) segment pairs – where a segment is usually the size of a sentence. The same recycling principle and the same textual resources also underlie current approaches to machine translation (MT) systems. Corpora can thus be said to be the engine that has propelled the two major transformations we have witnessed since the 1990s in the translation world: CAT and, more recently, MT. However, this role has remained somewhat hidden, since the main emphasis has been on the efficient retrieval of translation matches by more or less sophisticated algorithms. While responsibility for reviewing and approving suggestions by CAT tools and for post-editing machine-translated output is bound to remain with the translator, in CAT and MT it is the software that does most of the corpus-related work, and translators may be only vaguely aware of the inner workings of the technology they use daily. In the type of corpus work described in this chapter, corpora and corpus users instead take centre stage; efficient retrieval is not a priority, and responsibility for querying corpora and for interpreting results remains with the user.

How to use corpora for translation

silvia bernardini
2022

Abstract

This chapter describes translation-relevant types of corpora and the main ways in which they can be used to (learn to) translate and to study translation. Virtually all professional translators nowadays, and also most non-professional translators and students of translation, are familiar with translation memories (TMs). These resources, that lie at the core of computer-assisted translation (CAT) tools, consist of databases of aligned source text (ST) and target text (TT) segment pairs – where a segment is usually the size of a sentence. The same recycling principle and the same textual resources also underlie current approaches to machine translation (MT) systems. Corpora can thus be said to be the engine that has propelled the two major transformations we have witnessed since the 1990s in the translation world: CAT and, more recently, MT. However, this role has remained somewhat hidden, since the main emphasis has been on the efficient retrieval of translation matches by more or less sophisticated algorithms. While responsibility for reviewing and approving suggestions by CAT tools and for post-editing machine-translated output is bound to remain with the translator, in CAT and MT it is the software that does most of the corpus-related work, and translators may be only vaguely aware of the inner workings of the technology they use daily. In the type of corpus work described in this chapter, corpora and corpus users instead take centre stage; efficient retrieval is not a priority, and responsibility for querying corpora and for interpreting results remains with the user.
The Routledge Handbook of Corpus Linguistics (2nd edition)
485
498
silvia bernardini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/858530
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact