The theoretical characterisation of multiword expressions (MWEs) is tightlyconnected to their actual occurrences in data and to their representation inlexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions,revisiting lexicographers's choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.

Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian / A. Zaninello; M. Nissim. - STAMPA. - (2010), pp. XX-XX. (Intervento presentato al convegno conference on International Language Resources and Evaluation (LREC'10) tenutosi a Valletta, Malta nel May 19-21 2010).

Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian

NISSIM, MALVINA
2010

Abstract

The theoretical characterisation of multiword expressions (MWEs) is tightlyconnected to their actual occurrences in data and to their representation inlexical resources. We present three lexical resources for Italian MWEs, namely an electronic lexicon, a series of example corpora and a database of MWEs represented around morphosyntactic patterns. These resources are matched against, and created from, a very large web-derived corpus for Italian that spans across registers and domains. We can thus test expressions coded by lexicographers in a dictionary, thereby discarding unattested expressions,revisiting lexicographers's choices on the basis of frequency information, and at the same time creating an example sub-corpus for each entry. We organise MWEs on the basis of the morphosyntactic information obtained from the data in an electronic, flexible knowledge-base containing structured annotation exploitable for multiple purposes. We also suggest further work directions towards characterising MWEs by analysing the data organised in our database through lexico-semantic information available in WordNet or MultiWordNet-like resources, also in the perspective of expanding their set through the extraction of other similar compact expressions.
2010
Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)
XX
XX
Creation of Lexical Resources for a Characterisation of Multiword Expressions in Italian / A. Zaninello; M. Nissim. - STAMPA. - (2010), pp. XX-XX. (Intervento presentato al convegno conference on International Language Resources and Evaluation (LREC'10) tenutosi a Valletta, Malta nel May 19-21 2010).
A. Zaninello; M. Nissim
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/89291
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 1
social impact