ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Croatian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). This version of the dataset has various annotation errors corrected and the dataset encoded in the CoNLL-U-Plus format, similar to other manually annotated linguistic datasets for Croatian and Serbian. The continuous improvement of this dataset is led by the CLASSLA knowledge centre for South Slavic languages (https://www.clarin.si/info/k-centre/) and the ReLDI Centre Belgrade.

Ljubešić, N., Erjavec, T., Batanović, V., Miličević, M., Samardžić, T. (2023). Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0.

Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0

Miličević, Maja;
2023

Abstract

ReLDI-NormTagNER-hr 3.0 is a manually annotated corpus of Croatian tweets. It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity recognition of non-standard Croatian. Each tweet is also annotated for its automatically assigned standardness levels (T = technical standardness, L = linguistic standardness). This version of the dataset has various annotation errors corrected and the dataset encoded in the CoNLL-U-Plus format, similar to other manually annotated linguistic datasets for Croatian and Serbian. The continuous improvement of this dataset is led by the CLASSLA knowledge centre for South Slavic languages (https://www.clarin.si/info/k-centre/) and the ReLDI Centre Belgrade.
2023
Ljubešić, N., Erjavec, T., Batanović, V., Miličević, M., Samardžić, T. (2023). Croatian Twitter training corpus ReLDI-NormTagNER-hr 3.0.
Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk; Miličević, Maja; Samardžić, Tanja
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1017889
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact