An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.

Malvina Nissim, Sara Castagnoli, Francesca Masini (2014). Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology. Stroudsburg, PA : Association for Computational Linguistics.

Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology

NISSIM, MALVINA;CASTAGNOLI, SARA;MASINI, FRANCESCA
2014

Abstract

An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
2014
Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014)
57
61
Malvina Nissim, Sara Castagnoli, Francesca Masini (2014). Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology. Stroudsburg, PA : Association for Computational Linguistics.
Malvina Nissim; Sara Castagnoli; Francesca Masini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/397023
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact