ReLDI-NormTagNER-sr 3.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). This version of the dataset has various annotation mistakes corrected, and is now encoded in the CoNLL-U-Plus format, as are other linguistic training datasets for Croatian and Serbian. The continuous improvement of this dataset is led by the CLASSLA knowledge centre for South Slavic languages (https://www.clarin.si/info/k-centre/) and the ReLDI Centre Belgrade.

Ljubešić, N., Erjavec, T., Batanović, V., Miličević, M., Samardžić, T. (2023). Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0.

Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0

Miličević, Maja;
2023

Abstract

ReLDI-NormTagNER-sr 3.0 is a manually annotated corpus of Serbian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Serbian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). This version of the dataset has various annotation mistakes corrected, and is now encoded in the CoNLL-U-Plus format, as are other linguistic training datasets for Croatian and Serbian. The continuous improvement of this dataset is led by the CLASSLA knowledge centre for South Slavic languages (https://www.clarin.si/info/k-centre/) and the ReLDI Centre Belgrade.
2023
Ljubešić, N., Erjavec, T., Batanović, V., Miličević, M., Samardžić, T. (2023). Serbian Twitter training corpus ReLDI-NormTagNER-sr 3.0.
Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk; Miličević, Maja; Samardžić, Tanja
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1017888
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact