This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.
PoS-tagging Italian texts with CORISTagger / Tamburini F.. - ELETTRONICO. - 1:(2009), pp. ---. (Intervento presentato al convegno EVALITA 2009. AI*IA Workshop on Evaluation of NLP and Speech Tools for Italian tenutosi a Reggio Emilia nel December 12th 2009).
PoS-tagging Italian texts with CORISTagger
TAMBURINI, FABIO
2009
Abstract
This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.