This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.

PoS-tagging Italian texts with CORISTagger

TAMBURINI, FABIO
2009

Abstract

This paper presents an evolution of CORISTagger [1], an high-performance PoS-tagger for Italian developed at the University of Bologna. The system is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classification errors are discussed in detail.
EVALITA 2009. Workshop on Evaluation of NLP and Speech Tools for Italian
-
-
Tamburini F.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/88479
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact