The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
Francesca Masini, M.S.M. (2020). Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian. Aachen : CEUR Workshop Proceedings.
Multiword expressions we live by: a validated usage-based dataset from corpora of written Italian
Francesca Masini
;
2020
Abstract
The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Masini_et_al_2020_MWEs_we_live_by_CLIC.pdf
accesso aperto
Descrizione: Articolo completo
Tipo:
Versione (PDF) editoriale
Licenza:
Creative commons
Dimensione
235.42 kB
Formato
Adobe PDF
|
235.42 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.