Corpuses of large dimensions provide important and complete lexical information, but their analysis can become cumbersome, particularly for lexicographic purposes. Sub-corpuses of significantly smaller dimensions could be extracted from the original corpus and analyzed to overcome such limitations. However, an important aspect is to define which is the optimal dimension for these selected sub-corpuses in order to preserve the main features of the original corpus, both qualitatively and quantitatively. We show how statistical methodologies can help in determining theoptimal sample size. To corroborate our findings, we consider the corpus CREA (reference corpus of the current Spanish) and, as object of study, the adjective externoand its meanings. We show how the different meanings of this word are preserved and well-represented in a much smaller sub-corpus. This is shown for three different countries: Argentina, Spain and Mexico.

Corpus léxico y diccionario: la estricta representatividad estadística

Hugo E. lombardini;Silvia Bianconcini
2019

Abstract

Corpuses of large dimensions provide important and complete lexical information, but their analysis can become cumbersome, particularly for lexicographic purposes. Sub-corpuses of significantly smaller dimensions could be extracted from the original corpus and analyzed to overcome such limitations. However, an important aspect is to define which is the optimal dimension for these selected sub-corpuses in order to preserve the main features of the original corpus, both qualitatively and quantitatively. We show how statistical methodologies can help in determining theoptimal sample size. To corroborate our findings, we consider the corpus CREA (reference corpus of the current Spanish) and, as object of study, the adjective externoand its meanings. We show how the different meanings of this word are preserved and well-represented in a much smaller sub-corpus. This is shown for three different countries: Argentina, Spain and Mexico.
2019
Hugo E. lombardini; Silvia Bianconcini
File in questo prodotto:
File Dimensione Formato  
2019 Orillas Estadística.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 463.7 kB
Formato Adobe PDF
463.7 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/706969
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact