Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.

A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents / A. Di Iorio; M. Schirinzi; F. Vitali; C. Marchetti. - STAMPA. - 24:(2009), pp. 90-101. (Intervento presentato al convegno Proceedings of the 11th International Conference on Enterprise Information Systems, ICEIS 2009 tenutosi a Milan, Italy nel May 6-10, 2009.).

A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents

DI IORIO, ANGELO;VITALI, FABIO;
2009

Abstract

Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.
2009
Lecture Notes in Business Information Processing
90
101
A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents / A. Di Iorio; M. Schirinzi; F. Vitali; C. Marchetti. - STAMPA. - 24:(2009), pp. 90-101. (Intervento presentato al convegno Proceedings of the 11th International Conference on Enterprise Information Systems, ICEIS 2009 tenutosi a Milan, Italy nel May 6-10, 2009.).
A. Di Iorio; M. Schirinzi; F. Vitali; C. Marchetti
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/87829
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 4
social impact