The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.

Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian

Francesca Masini
;
2020

Abstract

The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
2020
Proceedings of the Seventh Italian Conference on Computational Linguistics
1
5
Francesca Masini, M. Silvia Micheli, Andrea Zaninello, Sara Castagnoli, Malvina Nissim
File in questo prodotto:
File Dimensione Formato  
Masini_et_al_2020_MWEs_we_live_by_CLIC.pdf

accesso aperto

Descrizione: Articolo completo
Tipo: Versione (PDF) editoriale
Licenza: Creative commons
Dimensione 235.42 kB
Formato Adobe PDF
235.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/802257
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact