An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.

Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology / Malvina Nissim; Sara Castagnoli; Francesca Masini. - ELETTRONICO. - (2014), pp. 57-61. (Intervento presentato al convegno 10th Workshop on Multiword Expressions (MWE 2014) tenutosi a Gothenburg, Sweden nel 26-27 April 2014).

Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology

NISSIM, MALVINA;CASTAGNOLI, SARA;MASINI, FRANCESCA
2014

Abstract

An established method for MWE extraction is the combined use of previously identified POS-patterns and association measures. However, the selection of such POSpatterns is rarely debated. Focusing on Italian MWEs containing at least one adjective, we set out to explore how candidate POS-patterns listed in relevant literature and lexicographic sources compare with POS sequences exhibited by statistically significant n-grams including an adjective position extracted from a large corpus of Italian. All literature-derived patterns are found—and new meaningful candidate patterns emerge—among the top-ranking trigrams for three association measures. We conclude that a final solid set to be used for MWE extraction will have to be further refined through a combination of association measures as well as manual inspection.
2014
Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014)
57
61
Extracting MWEs from Italian corpora: a case study for refining the POS-pattern methodology / Malvina Nissim; Sara Castagnoli; Francesca Masini. - ELETTRONICO. - (2014), pp. 57-61. (Intervento presentato al convegno 10th Workshop on Multiword Expressions (MWE 2014) tenutosi a Gothenburg, Sweden nel 26-27 April 2014).
Malvina Nissim; Sara Castagnoli; Francesca Masini
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/397023
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact