CRIS Current Research Information System

In this paper we present the pipeline of recently developed language technology tools for Slovene, Croatian and Serbian. They currently cover text segmentation, text normalisation, part-of-speech tagging, lemmatisation and inflectional lexicon lookup. Most rely on machine learning approaches, such as statistical machine translation and conditional random fields, capable of producing high-quality models for the phenomenon covered. Special emphasis is put on easy accessibility of these tools by offering them and the trained models for all three languages as (1) open source via public git repositories and (2) online in the form of web applications and web services.

Nikola Ljubešić, Tomaž Erjavec, Darja Fišer, Tanja Samardžić, Maja Miličević, Filip Klubička, et al. (2016). Easily accessible language technologies for Slovene, Croatian and Serbian.

Easily accessible language technologies for Slovene, Croatian and Serbian

Nikola Ljubešić;Tomaž Erjavec;Darja Fišer;Tanja Samardžić;Maja Miličević;Filip Klubička;Filip Petkovski

2016

Abstract

In this paper we present the pipeline of recently developed language technology tools for Slovene, Croatian and Serbian. They currently cover text segmentation, text normalisation, part-of-speech tagging, lemmatisation and inflectional lexicon lookup. Most rely on machine learning approaches, such as statistical machine translation and conditional random fields, capable of producing high-quality models for the phenomenon covered. Special emphasis is put on easy accessibility of these tools by offering them and the trained models for all three languages as (1) open source via public git repositories and (2) online in the form of web applications and web services.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Titolo del volume
	
				Proceedings of the Conference on Language Technologies & Digital Humanities
			
	Pagina iniziale
	
				120
			
	Pagina finale
	
				124
			
	Citazione
	
				Nikola Ljubešić,  Tomaž Erjavec,  Darja Fišer,  Tanja Samardžić,  Maja Miličević,  Filip Klubička, et al. (2016). Easily accessible language technologies for Slovene, Croatian and Serbian.
			
	Tutti gli autori
	
						Nikola Ljubešić; Tomaž Erjavec; Darja Fišer; Tanja Samardžić; Maja Miličević; Filip Klubička; Filip Petkovski...espandi
						
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/775531

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

ND

ND

social impact