Overlapping structures in XML are not symptoms of a misunderstanding of the intrinsic characteristics of a text document nor evidence of extreme scholarly requirements far beyond those needed by the most common XML-based applications. On the contrary, overlaps have started to appear in a large number of incredibly popular applications hidden under the guise of syntactical tricks to the basic hierarchy of the XML data format. Unfortunately, syntactical tricks have the drawback that the affected structures require complicated workarounds to support even the simplest query or usage. In this article, we present Extremely Annotational Resource Description Framework (RDF) Markup (EARMARK), an approach to overlapping markup that simplifies and streamlines the management of multiple hierarchies on the same content, and provides an approach to sophisticated queries and usages over such structures without the need of ad-hoc applications, simply by using Semantic Web tools and languages. We compare how relevant tasks (e.g., the identification of the contribution of an author in a word processor document) are of some substantial complexity when using the original data format and become more or less trivial when using EARMARK. We finally evaluate positively the memory and disk requirements of EARMARK documents in comparison to Open Office and Microsoft Word XML-based formats.
Di Iorio A., Peroni S., Vitali F. (2011). A Semantic Web Approach To Everyday Overlapping Markup. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 62, 1696-1716 [10.1002/asi.21591].
A Semantic Web Approach To Everyday Overlapping Markup
DI IORIO, ANGELO;PERONI, SILVIO;VITALI, FABIO
2011
Abstract
Overlapping structures in XML are not symptoms of a misunderstanding of the intrinsic characteristics of a text document nor evidence of extreme scholarly requirements far beyond those needed by the most common XML-based applications. On the contrary, overlaps have started to appear in a large number of incredibly popular applications hidden under the guise of syntactical tricks to the basic hierarchy of the XML data format. Unfortunately, syntactical tricks have the drawback that the affected structures require complicated workarounds to support even the simplest query or usage. In this article, we present Extremely Annotational Resource Description Framework (RDF) Markup (EARMARK), an approach to overlapping markup that simplifies and streamlines the management of multiple hierarchies on the same content, and provides an approach to sophisticated queries and usages over such structures without the need of ad-hoc applications, simply by using Semantic Web tools and languages. We compare how relevant tasks (e.g., the identification of the contribution of an author in a word processor document) are of some substantial complexity when using the original data format and become more or less trivial when using EARMARK. We finally evaluate positively the memory and disk requirements of EARMARK documents in comparison to Open Office and Microsoft Word XML-based formats.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.