This contribution has a double aim. On the one hand, it highlights the various challenges and problems compilers of (simultaneous) interpreting and intermodal corpora are likely to face, and the solutions that were found and applied in three corpora of European Parliament plenary debates, i.e. EPIC, EPICG and EPTIC. On the other, it provides an accessible step-by-step guide for corpus developers who are working with European Parliament data, with the ultimate aim of bringing as far as possible into line the procedures used to transcribe the audio tracks, record metadata, annotate texts with part-of-speech and lemma information, perform text-to-text and text-to-audio/video alignment, and index the corpus for searching with appropriate corpus query tools. By adopting shared corpus building methods, it might be possible to leverage the substantial efforts already deployed by different research groups, and encourage others to take charge of new language pairs. This, we shall argue, might lead to a massively multilingual interpreting and intermodal corpus, through a distributed community effort.
Bernardini, S., Ferraresi, A., Russo, M., Collard, C., B. Defrancq (2018). Building Interpreting and Intermodal Corpora: A How to for a Formidable Task. Singapore : Springer [10.1007/978-981-10-6199-8_2].
Building Interpreting and Intermodal Corpora: A How to for a Formidable Task
Bernardini, S.
Conceptualization
;Ferraresi, A.
Conceptualization
;Russo, M.
Conceptualization
;
2018
Abstract
This contribution has a double aim. On the one hand, it highlights the various challenges and problems compilers of (simultaneous) interpreting and intermodal corpora are likely to face, and the solutions that were found and applied in three corpora of European Parliament plenary debates, i.e. EPIC, EPICG and EPTIC. On the other, it provides an accessible step-by-step guide for corpus developers who are working with European Parliament data, with the ultimate aim of bringing as far as possible into line the procedures used to transcribe the audio tracks, record metadata, annotate texts with part-of-speech and lemma information, perform text-to-text and text-to-audio/video alignment, and index the corpus for searching with appropriate corpus query tools. By adopting shared corpus building methods, it might be possible to leverage the substantial efforts already deployed by different research groups, and encourage others to take charge of new language pairs. This, we shall argue, might lead to a massively multilingual interpreting and intermodal corpus, through a distributed community effort.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.