Here we face a concrete problem of integrity of a very specific textual corpus, namely the 84 poems now forming, after a troubled journey, the so-called Diario Postumo (DP) by Eugenio Montale. Our approach is rather simple to describe: it is based on two distinct methods that measure similarity between texts, namely two different algorithms that given any pair of texts return a positive number which is smaller for similar texts. The first similarity measure, called entropic distance (or lzwe), is based on the use of cross entropy to measure differences between sequences of symbols, as learned from data compression theory. The second similarity distance, called n-gram distance, is also very simple to describe and it is based on the frequency of sequences of consecutive n characters. Both distances have been described elsewhere, but for completeness we report their precise description in the Appendix. The main purpose of our analysis is to test if it is possible, trough purely automatic and quantitative methods, to reveal anomalies in the poems forming the DP. Such anomalies exist and are compatible and coherent with the hypothesis that they are the result of several elaborations of authentic Montale material, originally created and recorded in different forms. Our research on the DP is just a part of a wider research project aimed at exploring the possibility of combining philological qualitative methods with mathematical quantitative approaches to solve problems in authorship attribution (A.A.), forgery detection and integrity texting of textual corpora.

Dynamics of Style and the Case of the Diario Postumo by Eugenio Montale: A Quantitative Approach

DEGLI ESPOSTI, MIRKO
2016

Abstract

Here we face a concrete problem of integrity of a very specific textual corpus, namely the 84 poems now forming, after a troubled journey, the so-called Diario Postumo (DP) by Eugenio Montale. Our approach is rather simple to describe: it is based on two distinct methods that measure similarity between texts, namely two different algorithms that given any pair of texts return a positive number which is smaller for similar texts. The first similarity measure, called entropic distance (or lzwe), is based on the use of cross entropy to measure differences between sequences of symbols, as learned from data compression theory. The second similarity distance, called n-gram distance, is also very simple to describe and it is based on the frequency of sequences of consecutive n characters. Both distances have been described elsewhere, but for completeness we report their precise description in the Appendix. The main purpose of our analysis is to test if it is possible, trough purely automatic and quantitative methods, to reveal anomalies in the poems forming the DP. Such anomalies exist and are compatible and coherent with the hypothesis that they are the result of several elaborations of authentic Montale material, originally created and recorded in different forms. Our research on the DP is just a part of a wider research project aimed at exploring the possibility of combining philological qualitative methods with mathematical quantitative approaches to solve problems in authorship attribution (A.A.), forgery detection and integrity texting of textual corpora.
2016
Creativity and Universality in Language
157
176
Benedetto, Dario; Degli Esposti, Mirko
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/586769
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact