The European Parliament Interpreting Corpus (EPIC): implementation and developments

Russo, Mariachiara; Bendazzoli, Claudio; Sandrelli, Annalisa; Spinolo, Nicoletta

The call for the creation of corpora in Interpreting Studies that could be queried by means of Corpus Linguistics tools was first made by Shlesinger (1998) over a decade ago. However, only recently has this need started to be met. The European Parliament Interpreting Corpus (EPIC) is one of the first machine-readable corpora to be openly accessible in the field of Interpreting Studies. It was created in 2004/2006 by the Directionality Research Group of the University of Bologna at Forlì, and consists of 9 sub-corpora in total: three sub-corpora of source language speeches (Italian, English and Spanish) and six sub-corpora of simultaneously interpreted speeches, thus comprising all possible directions and combinations of the three languages involved (Monti et al. 2005, Sandrelli et al.. 2010). At present, the corpus includes only a small part of all the recorded material, which is stored in the EPIC Multimedia Archive. The present paper describes the steps undertaken to create the corpus and the ongoing developments to further expand it and improve its structure. Firstly, the methodology used for user-friendly data collection and transcription and for the part-of-speech (POS) tagging and lemmatisation of this open corpus will be described; then, the web-interface developed to carry out simple and advanced queries on-line will be illustrated (see http://sslmitdev-online.sslmit.unibo.it/corpora/corporaproject.php?path=E.P.I.C.). Examples of the corpus-based studies carried out so far will be provided (Russo et al 2006, Bendazzoli et al 2011) and a special emphasis will be placed on the great potential of EPIC as a pedagogical and research tool in interpreter training. Interpreting students can transcribe and analyse part of the recorded material stored in the EPIC Multimedia Archive in their graduation dissertations, thus taking advantage of a unique opportunity to reflect upon real-life professional interpreting performances and upon their own learning process. Finally, ongoing developments and future steps will be discussed: text-to-sound and source text-to-target text alignment procedures are currently being tested, so as to make EPIC a more powerful resource to be explored by the interpreting research community References BENDAZZOLI, C., SANDRELLI, A. AND M. RUSSO (2011) “Disfluencies in simultaneous interpreting: a corpus-based analysis”, in A. Kruger, K. Walmach and J. Munday (eds.) Corpus-based Translation Studies: Research and Applications, London /New York: Continuum, 282-306. MONTI, C., BENDAZZOLI, C., SANDRELLI A. AND M. RUSSO (2005) “Studying Directionality in Simultaneous Interpreting through an Electronic Corpus: EPIC (European Parliament Interpreting Corpus)” paper presented at the International Symposium “Pour une traductologie proactive” organised for the 50° anniversary of META, University of Montreal, 6th-9th April 2005, (vol 50:4). Online: http://www.erudit.org/revue/meta/2005/v50/n4/019850ar.pdf RUSSO, M., BENDAZZOLI, C. E A. SANDRELLI (2006) "Looking for Lexical Patterns in a Trilingual Corpus of Source and Interpreted Speeches: Extended Analysis of EPIC (European Parliament Interpreting Corpus)", Forum, vol. 4:1, 221-254. SANDRELLI, A., BENDAZZOLI, C. AND M. RUSSO (2010) “European Parliament Interpreting Corpus (EPIC): Methodological issues and preliminary results on lexical patterns in SI”, International Journal of Translation 22 (1-2), 165-203. SHLESINGER, M. (1998): “Corpus-based interpreting studies as an offshoot of corpus-based translation studies”, META, 43-4, pp. 486-493.

Russo, M., Bendazzoli, C., Sandrelli, A., Spinolo, N. (2012). The European Parliament Interpreting Corpus (EPIC): implementation and developments. BERN : PETER LANG.