Recent approaches adopt multimodel databases (MMDBs) to natively handle the variety issues arising from the increasing amounts of heterogeneous data (structured, semi-structured, graph- based, etc.) made available. However, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational Database Management Systems (DBMSs) for storage and querying, thus constraining data variety into the rigidity of a structured schema. This paper provides a preliminary investigation of the performance of an MMDB when used to store multidimensional data for OLAP analysis. A multimodel DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL data transformations, and that of ensuring better flexibility, extensibility, and evolvability thanks to the use of schemaless models. To support our investigation we present an implementation, based on the UniBench benchmark dataset, that extends a star schema with JSON, XML, spatial, and key-value data; we also define a sample OLAP workload and use it to test the performance of our solution and compare it with that of a classical star schema. As expected, the full-relational implementation performs better, but we believe that this gap could be balanced by the benefits of multimodel in dealing with variety. Finally, we give our perspective view of the research on this topic.

To Each His Own: Accommodating Data Variety by a Multimodel Star Schema

Stefano Rizzi
2020

Abstract

Recent approaches adopt multimodel databases (MMDBs) to natively handle the variety issues arising from the increasing amounts of heterogeneous data (structured, semi-structured, graph- based, etc.) made available. However, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational Database Management Systems (DBMSs) for storage and querying, thus constraining data variety into the rigidity of a structured schema. This paper provides a preliminary investigation of the performance of an MMDB when used to store multidimensional data for OLAP analysis. A multimodel DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL data transformations, and that of ensuring better flexibility, extensibility, and evolvability thanks to the use of schemaless models. To support our investigation we present an implementation, based on the UniBench benchmark dataset, that extends a star schema with JSON, XML, spatial, and key-value data; we also define a sample OLAP workload and use it to test the performance of our solution and compare it with that of a classical star schema. As expected, the full-relational implementation performs better, but we believe that this gap could be balanced by the benefits of multimodel in dealing with variety. Finally, we give our perspective view of the research on this topic.
2020
Proceedings of the 22nd International Workshop on Design, Optimization,Languages and Analytical Processing of Big Data co-located with EDBT/ICDT2020 Joint Conference, DOLAP@EDBT/ICDT 2020
66
73
Sandro Bimonte, Yassine Hifdi, Mohammed Maliari, Patrick Marcel, Stefano Rizzi
File in questo prodotto:
File Dimensione Formato  
dolap20-M3D.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.15 MB
Formato Adobe PDF
1.15 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/752105
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact