The aim of this paper is to give an ‘a-theoretical’ definition of the main parts of speech, extracting the set of categories from the actual distribution of data, or, in other words, from the contexts of occurrence of words. The definitions of the parts of speech obtained in this way depend uniquely on contextual information and on the analysis of distributional similarities among words, and are not conditioned by any theoretical framework. The research hypothesis is that two words which are formally and semantically similar and which share the same syntactic behavior will occur in similar contexts. As a consequence, if we classify words according to their contexts of occurrence, we should expect that formally and semantically similar words will turn up in the same class. So, if we investigate a huge, representative corpus of a language, we should be able to automatically extract all the parts of speech by means of a survey of the contexts of occurrences. In this article we will test this approach on Italian, basing our analysis on CORIS, a representative corpus of written Italian.

INDUZIONE DI CATEGORIE GRAMMATICALI E LESSICALI

GRANDI, NICOLA;TAMBURINI, FABIO
2016

Abstract

The aim of this paper is to give an ‘a-theoretical’ definition of the main parts of speech, extracting the set of categories from the actual distribution of data, or, in other words, from the contexts of occurrence of words. The definitions of the parts of speech obtained in this way depend uniquely on contextual information and on the analysis of distributional similarities among words, and are not conditioned by any theoretical framework. The research hypothesis is that two words which are formally and semantically similar and which share the same syntactic behavior will occur in similar contexts. As a consequence, if we classify words according to their contexts of occurrence, we should expect that formally and semantically similar words will turn up in the same class. So, if we investigate a huge, representative corpus of a language, we should be able to automatically extract all the parts of speech by means of a survey of the contexts of occurrences. In this article we will test this approach on Italian, basing our analysis on CORIS, a representative corpus of written Italian.
2016
Categorie grammaticali e classi di parole. Statuto e riflessi metalinguistici
115
137
D'Errico, M.; Grandi, N.; Paternesi Melloni, S.; Tamburini, F.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/574846
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact