Recently, the relation between the entropy of words a new measure from Information Theory introduced by Montemurro in 2001 and the role of words in literary texts, as well as the capacity of entropy for clustering words, has been shown. Our final goal is to investigate if and how the list of ranked words using entropy can be useful in other more practical contexts, such as information retrieval task or automatic classification of bio-medical textual data. In this work, we analyze the effectiveness of the keywords selected by the Montemurro's approach to capture the semantics behind biomedical text collections, and using the spectrum of words we offer a visual representation of the text's content. Besides, we compare the resulting keyword lists with the ones obtained with TF-IDF measure, and discuss some of the most interesting facts obtained from this comparison.

Visual characterization of biomedical texts with word entropy

DEGLI ESPOSTI, MIRKO;
2010

Abstract

Recently, the relation between the entropy of words a new measure from Information Theory introduced by Montemurro in 2001 and the role of words in literary texts, as well as the capacity of entropy for clustering words, has been shown. Our final goal is to investigate if and how the list of ranked words using entropy can be useful in other more practical contexts, such as information retrieval task or automatic classification of bio-medical textual data. In this work, we analyze the effectiveness of the keywords selected by the Montemurro's approach to capture the semantics behind biomedical text collections, and using the spectrum of words we offer a visual representation of the text's content. Besides, we compare the resulting keyword lists with the ones obtained with TF-IDF measure, and discuss some of the most interesting facts obtained from this comparison.
2010
Network Tools and Applications in Biology NETTAB 2010, Biological Wikis
139
142
M. Degli Esposti; R. Danger; P. Rosso Paolo; S. Garcia-Blasco
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/106108
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact