Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.

Dealing with structural patterns of XML documents / Di Iorio, Angelo; Peroni, Silvio; Poggi, Francesco; Vitali, Fabio. - In: JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY. - ISSN 2330-1643. - STAMPA. - 65:9(2014), pp. 1884-1900. [10.1002/asi.23088]

Dealing with structural patterns of XML documents

DI IORIO, ANGELO;PERONI, SILVIO;POGGI, FRANCESCO;VITALI, FABIO
2014

Abstract

Evaluating collections of XML documents without paying attention to the schema they were written in may give interesting insights into the expected characteristics of a markup language, as well as any regularity that may span vocabularies and languages, and that are more fundamental and frequent than plain content models. In this paper we explore the idea of structural patterns in XML vocabularies, by examining the characteristics of elements as they are used, rather than as they are defined. We introduce from the ground up a formal theory of 8 plus 3 structural patterns for XML elements, and verify their identifiability in a number of different XML vocabularies. The results allowed the creation of visualization and content extraction tools that are completely independent of the schema and without any previous knowledge of the semantics and organization of the XML vocabulary of the documents.
2014
Dealing with structural patterns of XML documents / Di Iorio, Angelo; Peroni, Silvio; Poggi, Francesco; Vitali, Fabio. - In: JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY. - ISSN 2330-1643. - STAMPA. - 65:9(2014), pp. 1884-1900. [10.1002/asi.23088]
Di Iorio, Angelo; Peroni, Silvio; Poggi, Francesco; Vitali, Fabio
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/521151
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 7
social impact