The aim of this paper is to describe the ongoing development of an electronic corpus of the Timok vernacular, a rare example of an oral dialect corpus of the Serbian language. The corpus comprises data relevant for both linguistics and studies of (traditional) culture, and as such it can help bridge the gap currently present between corpus linguistics and digital humanities. The material contained in the corpus is a result of fieldwork research conducted between 2015 and 2017, mainly within the project Protecting the intangible culture of the Timok vernacular. The paper outlines the phases of fieldwork research, in particular the selection of villages and participants, as well as the open-ended interview methodology applied in data collection. The steps in corpus development are presented next: transcription, annotation (part-of-speech tagging, lemmatization, normalization), and the resulting search options. In addition, an overview of previous and ongoing studies based on the collected material are provided, capturing the domains of dialectology, Balkan linguistics, socio-, areal and anthropological linguistics, as well as studies of folklore and traditional culture, with suggestions for future research in these domains.
Mirić, M., Miličević Petrović, M., Ćirković, S. (2021). Digitalizacija jezika i kulture kroz elektronske korpuse: primer timočkih govora. Belgrado : Savez slavističkih društava Srbije - Filološki fakultet [10.18485/mks_dh_skn.2021.1.ch7].
Digitalizacija jezika i kulture kroz elektronske korpuse: primer timočkih govora
Miličević Petrović, Maja;
2021
Abstract
The aim of this paper is to describe the ongoing development of an electronic corpus of the Timok vernacular, a rare example of an oral dialect corpus of the Serbian language. The corpus comprises data relevant for both linguistics and studies of (traditional) culture, and as such it can help bridge the gap currently present between corpus linguistics and digital humanities. The material contained in the corpus is a result of fieldwork research conducted between 2015 and 2017, mainly within the project Protecting the intangible culture of the Timok vernacular. The paper outlines the phases of fieldwork research, in particular the selection of villages and participants, as well as the open-ended interview methodology applied in data collection. The steps in corpus development are presented next: transcription, annotation (part-of-speech tagging, lemmatization, normalization), and the resulting search options. In addition, an overview of previous and ongoing studies based on the collected material are provided, capturing the domains of dialectology, Balkan linguistics, socio-, areal and anthropological linguistics, as well as studies of folklore and traditional culture, with suggestions for future research in these domains.File | Dimensione | Formato | |
---|---|---|---|
Miric et al (2021).pdf
accesso aperto
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
286.68 kB
Formato
Adobe PDF
|
286.68 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.