Managing semantics in XML vocabularies: an experience in the legal and legislative domain

Barabucci, Gioele; Cervone, Luca; Di Iorio, Angelo; Palmirani, Monica; Peroni, Silvio; Vitali, Fabio

Akoma Ntoso is an XML vocabulary for legal and legislative documents whose primary objective is to provide semantic information on top of a received legal text. There are three key aspects of legal documents on which Akoma Ntoso focuses: identification of structures, references to other legal documents and storage of non-authoritative annotations. Structures are identified and marked up according to an XML vocabulary based on common patterns found in legal documents. References to legal documents across countries are made using a common naming convention based on URIs. Third-party annotations and interpretations (broadly called metadata) are stored using and ontologically sound approach compatible with Topic Maps [15], OWL [19] and GRDDL [4]. The XML documents created according to the Akoma Ntoso specifications use a layered structure where each layer addresses a single problem: the text layer provides a faithful representation of the original content of the legal text, the structure layer provides a hierarchical organization of the parts present in the text layers, the metadata layer associate information from the underlying layers with ontological information. Whenever this semantic information is the result of a subjective interpretation, Akoma Ntoso allows multiple and independent opinions to be stored in a formal way within the document, and used alternatively, cumulatively or compared to each other. The layered structure of Akoma Ntoso is an attempt at balancing extensibility, needed to accommodate the specific needs of individual countries, with clarity and self-explanatoriness, both needed for the preservation of legal digital resources over time (even long spans of time, measured in decades or centuries). Both these aspect have been evaluated taking into account the fact that long preservation of Akoma Ntoso documents must be possible even without access to the extensive original documentation. The same layered structure creates a strict separation between the content that has been approved by the body empowered by law to endorse it (data) and what has been added by other parties (metadata). This separation significantly helps the development of tools able to preserve and guarantee the authenticity of the processed legal document, favouring trust towards e-government initiatives. In fact, Akoma Ntoso XML documents can be managed in any step of the legislative or judiciary life cycle (for instance, in the publishing phase) without any modification to the received text. While Akoma Ntoso imposes an (extensible) XML vocabulary, it does not prescribe the use of a particular ontology. Actually Akoma Ntoso defines a minimal and loose ontology based on few Top Level Classes (TLCs) e.g., Person, Role, Concept, etc. These classes are only generic groupings of instances: no particular property is defined for any of them. Inside an Akoma Ntoso document, a section of the metadata links pieces of text with the appropriate TLC instances, another section of the metadata combines these instances to create complex relations. To perform elaborated computations on a document or on a collection of documents, more precise ontologies have to be used and linked with the provided metadata. For example, we may be interested in using the FRBR (Functional Requirements for Bibliographic Records) ontology to associate some of the document metadata describing legislative documents to FRBR concepts like Work or Expression of a Work. Another example is the representation of individual persons: instances of the TLC Person class may be associated to instances of the Person class of the FOAF (Friend of a Friend) ontology or to instances of the Creator class of the Dublin Core ontology. Akoma Ntoso allows the use of these and of future ontologies. Even if one relies only on the bare knowledge provided by the Akoma Ntoso minimal ontology and by its document markup, there are many interesting queries that can be carried out using only the origina...

CRIS Current Research Information System