This article introduces a method to identify and classify translation equivalences in multilingual news texts and applies it to the task of creating a corpus for the study of news translation, a notably challenging area within Translation Studies. The dataset is composed of 41 Greek-English news dispatches on the topic of migration by AMNA, the Greek national news agency. Conceptually, we build on previous research on ‘comparallel’ corpus architectures, which bring together features of comparable and parallel corpora and provide the necessary flexibility to account for the non-prototypical translated data characterizing multilingual news. The automated method uses state-of-the art Natural Language Processing techniques, namely sentence and word embeddings, which make it possible to account for nuanced translation relationships, distinguishing between translated, partially translated, related, and unrelated sentence pairs. We test the method against a benchmark of manually annotated sentences from the AMNA dataset and provide examples of correctly and incorrectly classified sentence pairs. We finally build a fully-fledged comparallel corpus based on the dataset and present a case study demonstrating how the corpus can be leveraged for corpus-assisted studies of news discourse, and most notably to investigate newsworthiness and ideological shifts occurring in multilingual news.
Ferraresi, A., Pistolia, E. (2025). Navigating translation equivalence in news corpora: Construction and analysis of a comparallel Greek-English corpus on migration. TRANSLATION AND TRANSLANGUAGING IN MULTILINGUAL CONTEXTS, 11(3), 334-360 [10.1075/ttmc.00171.fer].
Navigating translation equivalence in news corpora: Construction and analysis of a comparallel Greek-English corpus on migration
Ferraresi, Adriano
;Pistolia, Elton
2025
Abstract
This article introduces a method to identify and classify translation equivalences in multilingual news texts and applies it to the task of creating a corpus for the study of news translation, a notably challenging area within Translation Studies. The dataset is composed of 41 Greek-English news dispatches on the topic of migration by AMNA, the Greek national news agency. Conceptually, we build on previous research on ‘comparallel’ corpus architectures, which bring together features of comparable and parallel corpora and provide the necessary flexibility to account for the non-prototypical translated data characterizing multilingual news. The automated method uses state-of-the art Natural Language Processing techniques, namely sentence and word embeddings, which make it possible to account for nuanced translation relationships, distinguishing between translated, partially translated, related, and unrelated sentence pairs. We test the method against a benchmark of manually annotated sentences from the AMNA dataset and provide examples of correctly and incorrectly classified sentence pairs. We finally build a fully-fledged comparallel corpus based on the dataset and present a case study demonstrating how the corpus can be leveraged for corpus-assisted studies of news discourse, and most notably to investigate newsworthiness and ideological shifts occurring in multilingual news.| File | Dimensione | Formato | |
|---|---|---|---|
|
FerraresiPistolia25_postprint.pdf
accesso aperto
Descrizione: postprint
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per Accesso Aperto. Altra tipologia di licenza compatibile con Open Access
Dimensione
734.8 kB
Formato
Adobe PDF
|
734.8 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


