An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
Malvina Nissim, Sara Castagnoli, Francesca Masini (2014). Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology. Stroudsburg, PA : Association for Computational Linguistics.
Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology
NISSIM, MALVINA;CASTAGNOLI, SARA;MASINI, FRANCESCA
2014
Abstract
An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.