In this paper we carry out a cross-lingual comparison of nonstandard features in the language of social media for Slovene, Croatian and Serbian. The goal of the analysis is twofold: (1) we try to establish the extent to which the observed phenomena are universal rather than language-specific, and (2) we propose an approach for automatic scoring of (non)standardness levels of user-generated content, which can be used as a separate annotation layer in corpora. Quantitative and qualitative analyses of the results show that the majority of the language used on Twitter is fairly standard, especially in Slovene and Croatian. The prevalent characteristic of nonstandard Slovene tweets is nonstandard orthography, while nonstandard lexis is more typical of Serbian tweets, possibly due to a younger user profile.

Darja Fišer, Tomaž Erjavec, Nikola Ljubešić, Maja Miličević (2015). Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. Ljubljana : Faculty of Arts, University of Ljubljana.

Comparing the nonstandard language of Slovene, Croatian and Serbian tweets

Maja Miličević
2015

Abstract

In this paper we carry out a cross-lingual comparison of nonstandard features in the language of social media for Slovene, Croatian and Serbian. The goal of the analysis is twofold: (1) we try to establish the extent to which the observed phenomena are universal rather than language-specific, and (2) we propose an approach for automatic scoring of (non)standardness levels of user-generated content, which can be used as a separate annotation layer in corpora. Quantitative and qualitative analyses of the results show that the majority of the language used on Twitter is fairly standard, especially in Slovene and Croatian. The prevalent characteristic of nonstandard Slovene tweets is nonstandard orthography, while nonstandard lexis is more typical of Serbian tweets, possibly due to a younger user profile.
2015
Symposium Obdobja 34 - Grammar and Dictionary - Current language description (Part 1)
225
231
Darja Fišer, Tomaž Erjavec, Nikola Ljubešić, Maja Miličević (2015). Comparing the nonstandard language of Slovene, Croatian and Serbian tweets. Ljubljana : Faculty of Arts, University of Ljubljana.
Darja Fišer; Tomaž Erjavec; Nikola Ljubešić; Maja Miličević
File in questo prodotto:
File Dimensione Formato  
Fiser_et_al_2015.pdf

accesso aperto

Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 141.44 kB
Formato Adobe PDF
141.44 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/775845
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact