Interpreting corpora serve as the descriptive foundation of research and the ‘ground truth’ against which machine interpreting technologies are evaluated. However, access to corpora remains a critical bottleneck in interpreting studies due to data collection and processing challenges and the absence of interpreting- and translation-specific corpus publication venues. In this article, we present two technical infrastructures facilitating corpus access: a metadata schema standardising corpus description and the Unified Interpreting Corpus (UNIC; link redacted) platform for data and metadata search and publication. Guided by the internationally established FAIR (findability, accessibility, interoperability, and reusability) and CARE (collective benefit, authority to control, responsibility, and ethics) principles for scientific data management and stewardship, we designed the infrastructures based on a review of 125 spoken and signed language interpreting corpora, relevant international standards, community knowledge, and using open-source technologies. Feedback from interpreting students, researchers, and interpreters demonstrates greater perceived usefulness and satisfaction with UNIC than general-purpose search portals. Overall, we illustrate a value- and consensus-driven path towards optimising the use of interpreting corpora and careful curation of new ones, which avoids the duplication of efforts, helps chart research directions, and fosters co-design with communities.
Russo, M., Liu, N. (In stampa/Attività in corso). A value-sensitive metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform. INTERPRETING, X, 0-1.
A value-sensitive metadata schema for interpreting corpora: Implementation on the Unified Interpreting Corpus (UNIC) platform
Russo, M.
;Liu, N.
Primo
In corso di stampa
Abstract
Interpreting corpora serve as the descriptive foundation of research and the ‘ground truth’ against which machine interpreting technologies are evaluated. However, access to corpora remains a critical bottleneck in interpreting studies due to data collection and processing challenges and the absence of interpreting- and translation-specific corpus publication venues. In this article, we present two technical infrastructures facilitating corpus access: a metadata schema standardising corpus description and the Unified Interpreting Corpus (UNIC; link redacted) platform for data and metadata search and publication. Guided by the internationally established FAIR (findability, accessibility, interoperability, and reusability) and CARE (collective benefit, authority to control, responsibility, and ethics) principles for scientific data management and stewardship, we designed the infrastructures based on a review of 125 spoken and signed language interpreting corpora, relevant international standards, community knowledge, and using open-source technologies. Feedback from interpreting students, researchers, and interpreters demonstrates greater perceived usefulness and satisfaction with UNIC than general-purpose search portals. Overall, we illustrate a value- and consensus-driven path towards optimising the use of interpreting corpora and careful curation of new ones, which avoids the duplication of efforts, helps chart research directions, and fosters co-design with communities.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.