Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.
A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents / A. Di Iorio; M. Schirinzi; F. Vitali; C. Marchetti. - STAMPA. - 24:(2009), pp. 90-101. (Intervento presentato al convegno Proceedings of the 11th International Conference on Enterprise Information Systems, ICEIS 2009 tenutosi a Milan, Italy nel May 6-10, 2009.).
A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents
DI IORIO, ANGELO;VITALI, FABIO;
2009
Abstract
Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.