Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.
Titolo: | A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents | |
Autore/i: | DI IORIO, ANGELO; M. Schirinzi; VITALI, FABIO; C. Marchetti | |
Autore/i Unibo: | ||
Anno: | 2009 | |
Titolo del libro: | Lecture Notes in Business Information Processing | |
Pagina iniziale: | 90 | |
Pagina finale: | 101 | |
Abstract: | Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient. | |
Data prodotto definitivo in UGOV: | 1-mar-2010 | |
Appare nelle tipologie: | 4.01 Contributo in Atti di convegno |