Mapping complex structured data to RDF, e.g. for the creation of linked data, requires a clear understanding of the data, but also a clear understanding of the paradigm used by the mapping tool. We illustrate this with an empirical study comparing two different mapping tools, in particular considering the likelihood of user error. One tool uses path descriptions, e.g. JSONPath or XPath, to access data elements; the other uses a default triplification which can be queried, e.g. with SPARQL. As an example of the former, the study used YARRRML, to map from CSV, JSON and XML to RDF. As an example of the latter, the study used an extension of SPARQL, SPARQL Anything, to query the same data and CONSTRUCT a set of triples. Our study was a qualitative one, based on observing the kinds of errors made by participants using the two tools with identical mapping tasks, and using a grounded approach to categorize these errors. Whilst there are difficulties common to the two tools, there are also difficulties specific to each tool. For each tool, we present recommendations which help ensure that the mapping code is consistent with the data and the desired RDF. We propose future developments to reduce the difficulty users experience with YARRRML and SPARQL Anything. We also make some general recommendations about the future development of mapping tools and techniques. Finally, we propose some research questions for future investigation.

Warren, P., Mulholland, P., Daga, E., Asprino, L. (2024). Path-based and triplification approaches to mapping data into RDF: User behaviours and recommendations. SEMANTIC WEB, 15(6), 2479-2505 [10.3233/sw-243585].

Path-based and triplification approaches to mapping data into RDF: User behaviours and recommendations

Asprino, Luigi
2024

Abstract

Mapping complex structured data to RDF, e.g. for the creation of linked data, requires a clear understanding of the data, but also a clear understanding of the paradigm used by the mapping tool. We illustrate this with an empirical study comparing two different mapping tools, in particular considering the likelihood of user error. One tool uses path descriptions, e.g. JSONPath or XPath, to access data elements; the other uses a default triplification which can be queried, e.g. with SPARQL. As an example of the former, the study used YARRRML, to map from CSV, JSON and XML to RDF. As an example of the latter, the study used an extension of SPARQL, SPARQL Anything, to query the same data and CONSTRUCT a set of triples. Our study was a qualitative one, based on observing the kinds of errors made by participants using the two tools with identical mapping tasks, and using a grounded approach to categorize these errors. Whilst there are difficulties common to the two tools, there are also difficulties specific to each tool. For each tool, we present recommendations which help ensure that the mapping code is consistent with the data and the desired RDF. We propose future developments to reduce the difficulty users experience with YARRRML and SPARQL Anything. We also make some general recommendations about the future development of mapping tools and techniques. Finally, we propose some research questions for future investigation.
2024
Warren, P., Mulholland, P., Daga, E., Asprino, L. (2024). Path-based and triplification approaches to mapping data into RDF: User behaviours and recommendations. SEMANTIC WEB, 15(6), 2479-2505 [10.3233/sw-243585].
Warren, Paul; Mulholland, Paul; Daga, Enrico; Asprino, Luigi
File in questo prodotto:
File Dimensione Formato  
Path based_Semantic Web_2024.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Creative commons
Dimensione 2.08 MB
Formato Adobe PDF
2.08 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1005753
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact