In this article, we present the multi-facetted interface to the open PAISÀ corpus of Italian. Created within the project PAISÀ (Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati) [1], the corpus is designed to be free for processing, usage and distribution by the public. All documents included in the corpus are licensed under Creative Commons [2]. They are automatically annotated with lemma, part-of-speech and dependency information. The accompanying interface is designed to provide flexible, powerful, and easy-to-use modes of corpus access, to support language learning, language practicing and linguistic analyses. In our paper, we present in detail the interface’s functionalities and discuss underlying design decisions. First, we introduce the four principal components of the interface, then we present two specialized features that are added to increase the interface's impact for the language learning context. The main search components are (1) a basic search that adopts a “Google-style” search box, (2) an advanced search that provides elaborated graphical search options, and (3) a CQP search that makes use of the powerful query language CQP of the Open Corpus Workbench [3]. In addition, a fourth component for retrieving full-text corpus documents based on keyword searches is available. It is likewise providing the means for building temporary sub-corpora for specific topics. Integrated into each search component, the PAISÀ interface offers a function for restricting search results to sentences of limited complexity, as can be particularly helpful to novice language learners. The selection is based on formal text characteristics such as sentence length, vocabulary, etc. A second innovative feature of the interface is the supply of different display formats for the representation of search results. Besides the established KWIC (KeyWord In Context) and full sentence views, the entire corpus documents can be accessed and visual representations of the dependency relation information as well as keyword distributions are available. These partly interactive graphical representations build on recent developments in information visualization for language data, and are based on a visualization for dependency graphs [4] and one for Word Clouds [5]. Finally, we show how the PAISÀ interface can be employed in different language teaching tasks. In particular, we present a complete unit of work aimed at learners of Italian (level B1 or above of CEFR) and centered on students’ direct use of the PAISÀ interface and its functionalities. By doing so, we be giving concrete examples for targeted searches and interactions with the provided language material, as well as an exemplification of how the use of the corpus can be integrated with communicative language activities in the classroom. [1] www.corpusitaliano.it, co-financed by Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR). [2] http://creativecommons.org/ [3] Evert, S. and Hardie, A. (2011). ‘Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium’. In: Proceedings of the Corpus Linguistics 2011 conference, University of Birmingham, UK. [4] Culy, C., Lyding, V., and Dittmann, H. (2011). ‘xLDD: Extended Linguistic Dependency Diagrams’. In: Proceedings of the 15th International Conference on Information Visualisation, 12-15 July 2011, University of London, UK. [5] http://code.google.com/p/visapi-gadgets/

Open corpus interface for Italian language learning

BORGHETTI, CLAUDIA;DITTMANN, HENRIK;
2013

Abstract

In this article, we present the multi-facetted interface to the open PAISÀ corpus of Italian. Created within the project PAISÀ (Piattaforma per l’Apprendimento dell’Italiano Su corpora Annotati) [1], the corpus is designed to be free for processing, usage and distribution by the public. All documents included in the corpus are licensed under Creative Commons [2]. They are automatically annotated with lemma, part-of-speech and dependency information. The accompanying interface is designed to provide flexible, powerful, and easy-to-use modes of corpus access, to support language learning, language practicing and linguistic analyses. In our paper, we present in detail the interface’s functionalities and discuss underlying design decisions. First, we introduce the four principal components of the interface, then we present two specialized features that are added to increase the interface's impact for the language learning context. The main search components are (1) a basic search that adopts a “Google-style” search box, (2) an advanced search that provides elaborated graphical search options, and (3) a CQP search that makes use of the powerful query language CQP of the Open Corpus Workbench [3]. In addition, a fourth component for retrieving full-text corpus documents based on keyword searches is available. It is likewise providing the means for building temporary sub-corpora for specific topics. Integrated into each search component, the PAISÀ interface offers a function for restricting search results to sentences of limited complexity, as can be particularly helpful to novice language learners. The selection is based on formal text characteristics such as sentence length, vocabulary, etc. A second innovative feature of the interface is the supply of different display formats for the representation of search results. Besides the established KWIC (KeyWord In Context) and full sentence views, the entire corpus documents can be accessed and visual representations of the dependency relation information as well as keyword distributions are available. These partly interactive graphical representations build on recent developments in information visualization for language data, and are based on a visualization for dependency graphs [4] and one for Word Clouds [5]. Finally, we show how the PAISÀ interface can be employed in different language teaching tasks. In particular, we present a complete unit of work aimed at learners of Italian (level B1 or above of CEFR) and centered on students’ direct use of the PAISÀ interface and its functionalities. By doing so, we be giving concrete examples for targeted searches and interactions with the provided language material, as well as an exemplification of how the use of the corpus can be integrated with communicative language activities in the classroom. [1] www.corpusitaliano.it, co-financed by Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR). [2] http://creativecommons.org/ [3] Evert, S. and Hardie, A. (2011). ‘Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium’. In: Proceedings of the Corpus Linguistics 2011 conference, University of Birmingham, UK. [4] Culy, C., Lyding, V., and Dittmann, H. (2011). ‘xLDD: Extended Linguistic Dependency Diagrams’. In: Proceedings of the 15th International Conference on Information Visualisation, 12-15 July 2011, University of London, UK. [5] http://code.google.com/p/visapi-gadgets/
2013
Conference proceeding. International conference ICT for language learning, 6th edition
244
249
Verena, Lyding; Claudia, Borghetti; Henrik, Dittmann; Lionel, Nicolas; Egon, Stemle
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/592979
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact