We review some recent progress on the characterisation of long-range patterns of word use in language using methods from information theory. In particular, two levels of structure in language are considered. The first level corresponds to the patterns of words usage over different contextual domains. A direct application of information theory to quantify the specificity of words across different sections of a linguistic sequence leads to a measure of semantic information. Moreover, a natural scale emerges that characterises the typical size of semantic structures. Since the information measure is made up of additive contributions from individual words, it is possible to rank the words according to their overall weight in the total information. This allows the extraction of keywords most relevant to the semantic content of the sequence without any prior knowledge of the language. The second level considered is the complex structure of correlations among words in linguistic sequences. The degree of order in language can be quantified by means of the entropy. Reliable estimates of the entropy were obtained from corpora of texts from several linguistic families by means of lossless compression algorithms. The value of the entropy fluctuates across different languages since it depends on linguistic organisation at various levels. However, when a measure of relative entropy that specifically quantifies the degree of word ordering in language is estimated, it presents an almost constant value over all the linguistic families studied. This suggests that the entropy of word ordering is a novel quantitative linguistic universal. © 2013 Elsevier Ltd.

Quantifying the information in the long-range order of words: Semantic structures and universal linguistic constraints / Montemurro M.A.. - In: CORTEX. - ISSN 0010-9452. - ELETTRONICO. - 55:1(2014), pp. 5-16. [10.1016/j.cortex.2013.08.008]

Quantifying the information in the long-range order of words: Semantic structures and universal linguistic constraints

Montemurro M. A.
Membro del Collaboration Group
2014

Abstract

We review some recent progress on the characterisation of long-range patterns of word use in language using methods from information theory. In particular, two levels of structure in language are considered. The first level corresponds to the patterns of words usage over different contextual domains. A direct application of information theory to quantify the specificity of words across different sections of a linguistic sequence leads to a measure of semantic information. Moreover, a natural scale emerges that characterises the typical size of semantic structures. Since the information measure is made up of additive contributions from individual words, it is possible to rank the words according to their overall weight in the total information. This allows the extraction of keywords most relevant to the semantic content of the sequence without any prior knowledge of the language. The second level considered is the complex structure of correlations among words in linguistic sequences. The degree of order in language can be quantified by means of the entropy. Reliable estimates of the entropy were obtained from corpora of texts from several linguistic families by means of lossless compression algorithms. The value of the entropy fluctuates across different languages since it depends on linguistic organisation at various levels. However, when a measure of relative entropy that specifically quantifies the degree of word ordering in language is estimated, it presents an almost constant value over all the linguistic families studied. This suggests that the entropy of word ordering is a novel quantitative linguistic universal. © 2013 Elsevier Ltd.
2014
Quantifying the information in the long-range order of words: Semantic structures and universal linguistic constraints / Montemurro M.A.. - In: CORTEX. - ISSN 0010-9452. - ELETTRONICO. - 55:1(2014), pp. 5-16. [10.1016/j.cortex.2013.08.008]
Montemurro M.A.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/770429
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 13
social impact