In this chapter we describe the recent developments in language technology infrastructure building for three South Slavic languages – Slovenian, Croatian, and Serbian. These developments are primarily the result of intense coordination between different projects. Our experience shows that the infrastructure for language technologies can be significantly improved even in countries with a less favourable socio-economic situation, such as Croatia and Serbia, where insufficient organizational capacity and funding are available for a standard, top-down development. We suggest that such countries can adopt a bottom-up approach in which even minor, personal, or topically marginal projects are coordinated within the emerging community. Furthermore, such bottom-up environments can benefit from coordination with other similar environments, in our case in Croatia or Serbia. We further propose that bottom-up approaches can profit from coordination with top-down environments in neighbouring and/or culturally close countries, Slovenia in our case, with both sides experiencing a positive impact. We illustrate the synergistic effect of these different types of collaboration and coordination on the examples of textual data harvesting, manual data annotation, language tool development, and general infrastructure building. We wrap up with the most recent development – a CLARIN knowledge centre for South Slavic languages, where the collaborative methodology is expanded to all South Slavic languages. We close the chapter with a set of suggestions and good practices for researchers and language communities in a similar position to the ones discussed in this chapter.
Ljubešić, N., Erjavec, T., Miličević Petrović, M., Samardžić, T. (2022). Together We Are Stronger: Bootstrapping Language Technology Infrastructure for South Slavic Languages with CLARIN.SI. Berlin : De Gruyter [10.1515/9783110767377-017].
Together We Are Stronger: Bootstrapping Language Technology Infrastructure for South Slavic Languages with CLARIN.SI
Miličević Petrović, Maja;
2022
Abstract
In this chapter we describe the recent developments in language technology infrastructure building for three South Slavic languages – Slovenian, Croatian, and Serbian. These developments are primarily the result of intense coordination between different projects. Our experience shows that the infrastructure for language technologies can be significantly improved even in countries with a less favourable socio-economic situation, such as Croatia and Serbia, where insufficient organizational capacity and funding are available for a standard, top-down development. We suggest that such countries can adopt a bottom-up approach in which even minor, personal, or topically marginal projects are coordinated within the emerging community. Furthermore, such bottom-up environments can benefit from coordination with other similar environments, in our case in Croatia or Serbia. We further propose that bottom-up approaches can profit from coordination with top-down environments in neighbouring and/or culturally close countries, Slovenia in our case, with both sides experiencing a positive impact. We illustrate the synergistic effect of these different types of collaboration and coordination on the examples of textual data harvesting, manual data annotation, language tool development, and general infrastructure building. We wrap up with the most recent development – a CLARIN knowledge centre for South Slavic languages, where the collaborative methodology is expanded to all South Slavic languages. We close the chapter with a set of suggestions and good practices for researchers and language communities in a similar position to the ones discussed in this chapter.File | Dimensione | Formato | |
---|---|---|---|
Ljubesic_et_al_2022_Together_we_are_stroneger_CLARIN.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
573.75 kB
Formato
Adobe PDF
|
573.75 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.