As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of order in linguistic sequences that give insights into two relevant aspects of language: the presence of statistical universals in word ordering, and the link between semantic information and the statistical linguistic structure. We first analyse a measure of relative entropy that assesses how much the ordering of words contributes to the overall statistical structure of language. This measure presents an almost constant value close to 3.5 bits/word across several linguistic families. Then, we show that a direct application of information theory leads to an entropy measure that can quantify semantic structures and extract keywords from linguistic samples, even without prior knowledge of the underlying language.

Damian H Zanette, Marcelo Alejandro Montemurro (2016). Complexity and Universality in the Long-Range Order of Words. Bologna : Degli Esposti M., Altmann E., Pachet F. [10.1007/978-3-319-24403-7_3].

Complexity and Universality in the Long-Range Order of Words

Marcelo Alejandro Montemurro
Membro del Collaboration Group
2016

Abstract

As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of order in linguistic sequences that give insights into two relevant aspects of language: the presence of statistical universals in word ordering, and the link between semantic information and the statistical linguistic structure. We first analyse a measure of relative entropy that assesses how much the ordering of words contributes to the overall statistical structure of language. This measure presents an almost constant value close to 3.5 bits/word across several linguistic families. Then, we show that a direct application of information theory leads to an entropy measure that can quantify semantic structures and extract keywords from linguistic samples, even without prior knowledge of the underlying language.
2016
Lecture Notes in Morphogenesis; Creativity and Universality in Language
27
41
Damian H Zanette, Marcelo Alejandro Montemurro (2016). Complexity and Universality in the Long-Range Order of Words. Bologna : Degli Esposti M., Altmann E., Pachet F. [10.1007/978-3-319-24403-7_3].
Damian H Zanette; Marcelo Alejandro Montemurro
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/770615
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 7
social impact