Corpuses of large dimensions provide important and complete lexical information, but their analysis can become cumbersome, particularly for lexicographic purposes. Sub-corpuses of significantly smaller dimensions could be extracted from the original corpus and analyzed to overcome such limitations. However, an important aspect is to define which is the optimal dimension for these selected sub-corpuses in order to preserve the main features of the original corpus, both qualitatively and quantitatively. We show how statistical methodologies can help in determining theoptimal sample size. To corroborate our findings, we consider the corpus CREA (reference corpus of the current Spanish) and, as object of study, the adjective externoand its meanings. We show how the different meanings of this word are preserved and well-represented in a much smaller sub-corpus. This is shown for three different countries: Argentina, Spain and Mexico.
Hugo E. lombardini, Silvia Bianconcini (2019). Corpus léxico y diccionario: la estricta representatividad estadística. ORILLAS RIVISTA D'ISPANISTICA, 8, 675-693.
Corpus léxico y diccionario: la estricta representatividad estadística
Hugo E. lombardini;Silvia Bianconcini
2019
Abstract
Corpuses of large dimensions provide important and complete lexical information, but their analysis can become cumbersome, particularly for lexicographic purposes. Sub-corpuses of significantly smaller dimensions could be extracted from the original corpus and analyzed to overcome such limitations. However, an important aspect is to define which is the optimal dimension for these selected sub-corpuses in order to preserve the main features of the original corpus, both qualitatively and quantitatively. We show how statistical methodologies can help in determining theoptimal sample size. To corroborate our findings, we consider the corpus CREA (reference corpus of the current Spanish) and, as object of study, the adjective externoand its meanings. We show how the different meanings of this word are preserved and well-represented in a much smaller sub-corpus. This is shown for three different countries: Argentina, Spain and Mexico.File | Dimensione | Formato | |
---|---|---|---|
2019 Orillas Estadística.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione
463.7 kB
Formato
Adobe PDF
|
463.7 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.