The aim of this paper is to describe the ongoing development of an electronic corpus of the Timok vernacular, a rare example of an oral dialect corpus of the Serbian language. The corpus comprises data relevant for both linguistics and studies of (traditional) culture, and as such it can help bridge the gap currently present between corpus linguistics and digital humanities. The material contained in the corpus is a result of fieldwork research conducted between 2015 and 2017, mainly within the project Protecting the intangible culture of the Timok vernacular. The paper outlines the phases of fieldwork research, in particular the selection of villages and participants, as well as the open-ended interview methodology applied in data collection. The steps in corpus development are presented next: transcription, annotation (part-of-speech tagging, lemmatization, normalization), and the resulting search options. In addition, an overview of previous and ongoing studies based on the collected material are provided, capturing the domains of dialectology, Balkan linguistics, socio-, areal and anthropological linguistics, as well as studies of folklore and traditional culture, with suggestions for future research in these domains.

Digitalizacija jezika i kulture kroz elektronske korpuse: primer timočkih govora

Miličević Petrović, Maja;
2021

Abstract

The aim of this paper is to describe the ongoing development of an electronic corpus of the Timok vernacular, a rare example of an oral dialect corpus of the Serbian language. The corpus comprises data relevant for both linguistics and studies of (traditional) culture, and as such it can help bridge the gap currently present between corpus linguistics and digital humanities. The material contained in the corpus is a result of fieldwork research conducted between 2015 and 2017, mainly within the project Protecting the intangible culture of the Timok vernacular. The paper outlines the phases of fieldwork research, in particular the selection of villages and participants, as well as the open-ended interview methodology applied in data collection. The steps in corpus development are presented next: transcription, annotation (part-of-speech tagging, lemmatization, normalization), and the resulting search options. In addition, an overview of previous and ongoing studies based on the collected material are provided, capturing the domains of dialectology, Balkan linguistics, socio-, areal and anthropological linguistics, as well as studies of folklore and traditional culture, with suggestions for future research in these domains.
Digitalna humanistika i slovensko kulturno nasleđe
75
94
Mirić, Mirjana; Miličević Petrović, Maja; Ćirković, Svetlana
File in questo prodotto:
File Dimensione Formato  
Miric et al (2021).pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 286.68 kB
Formato Adobe PDF
286.68 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/886705
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact