This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.
Tamburini F. (2009). PoS-tagging Italian texts with CORISTagger. REGGIO EMILIA : s.n.
PoS-tagging Italian texts with CORISTagger
TAMBURINI, FABIO
2009
Abstract
This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.