Interpreting corpora serve as the descriptive foundation of research and the ‘ground truth’ against which machine interpreting technologies are evaluated. However, access to corpora remains a critical bottleneck in interpreting studies due to data collection and processing challenges and the absence of interpreting- and translation-specific corpus publication venues. In this article, we present two technical infrastructures that facilitate corpus access: a metadata schema which standardises corpus description and the Unified Interpreting Corpus (UNIC) platform for data and metadata search and publication. Guided by the internationally established FAIR (findability, accessibility, interoperability and reusability) and CARE (collective benefit, authority to control, responsibility and ethics) principles for scientific data management and stewardship, we designed the infrastructures based on a review of 125 spoken and signed language interpreting corpora, relevant international standards and community knowledge and also by using open-source technologies. Feedback obtained from interpreting students, researchers and interpreters demonstrates greater perceived usefulness of and satisfaction with UNIC compared to general-purpose search portals. Overall, we illustrate a value- and consensus-driven path towards optimising the use of interpreting corpora and the careful curation of new ones, which avoids the duplication of effort, helps to chart research directions and fosters co-design with communities.
Liu, N., Russo, M. (2025). A value-sensitive metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform. INTERPRETING, 27(2), 157-196 [10.1075/intp.00123.liu].
A value-sensitive metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform
Liu, N.
;Russo, M.
2025
Abstract
Interpreting corpora serve as the descriptive foundation of research and the ‘ground truth’ against which machine interpreting technologies are evaluated. However, access to corpora remains a critical bottleneck in interpreting studies due to data collection and processing challenges and the absence of interpreting- and translation-specific corpus publication venues. In this article, we present two technical infrastructures that facilitate corpus access: a metadata schema which standardises corpus description and the Unified Interpreting Corpus (UNIC) platform for data and metadata search and publication. Guided by the internationally established FAIR (findability, accessibility, interoperability and reusability) and CARE (collective benefit, authority to control, responsibility and ethics) principles for scientific data management and stewardship, we designed the infrastructures based on a review of 125 spoken and signed language interpreting corpora, relevant international standards and community knowledge and also by using open-source technologies. Feedback obtained from interpreting students, researchers and interpreters demonstrates greater perceived usefulness of and satisfaction with UNIC compared to general-purpose search portals. Overall, we illustrate a value- and consensus-driven path towards optimising the use of interpreting corpora and the careful curation of new ones, which avoids the duplication of effort, helps to chart research directions and fosters co-design with communities.| File | Dimensione | Formato | |
|---|---|---|---|
|
Liu&Russo_A value-sensitive metadata schema for interpreting corpora.pdf
accesso aperto
Tipo:
Postprint / Author's Accepted Manuscript (AAM) - versione accettata per la pubblicazione dopo la peer-review
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
2.23 MB
Formato
Adobe PDF
|
2.23 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


