Answering GPSJ queries in a polystore: A dataspace-based approach

Ben Hamadou, H.; Gallinucci, E.; Golfarelli, M.

doi:10.1007/978-3-030-33223-5_16

The discipline of data science is steering analysts away from traditional data warehousing and towards a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper, we propose an approach to support data analysis within a polystore supporting relational, document and column data models by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying databases. The expressiveness we enable corresponds to GPSJ queries, which are the most common class of queries in OLAP applications. We rely on Nested Relational Algebra to define a cross-database execution plan. The plan is composed of several local plans, to be executed on the distinct databases, and a global plan, which combines and possibly aggregates inter-database data. The system has been prototyped on Apache Spark.

Ben Hamadou, H., Gallinucci, E., Golfarelli, M. (2019). Answering GPSJ queries in a polystore: A dataspace-based approach. Springer [10.1007/978-3-030-33223-5_16].

Answering GPSJ queries in a polystore: A dataspace-based approach

Ben Hamadou H.;Gallinucci E.;Golfarelli M.

2019

Abstract

The discipline of data science is steering analysts away from traditional data warehousing and towards a more flexible and lightweight approach to data analysis. The idea is to perform OLAP analyses in a pay-as-you-go manner across heterogeneous schemas and data models, where the integration is progressively carried out by the user as the available data is explored. In this paper, we propose an approach to support data analysis within a polystore supporting relational, document and column data models by automatically handling both data model and schema heterogeneity through a dataspace layer on top of the underlying databases. The expressiveness we enable corresponds to GPSJ queries, which are the most common class of queries in OLAP applications. We rely on Nested Relational Algebra to define a cross-database execution plan. The plan is composed of several local plans, to be executed on the distinct databases, and a global plan, which combines and possibly aggregates inter-database data. The system has been prototyped on Apache Spark.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				189
			
	Pagina finale
	
				203
			
	Collana/Serie
	
				LECTURE NOTES IN ARTIFICIAL INTELLIGENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-33223-5_16
			
	Citazione
	
				Ben Hamadou, H., Gallinucci, E., Golfarelli, M. (2019). Answering GPSJ queries in a polystore: A dataspace-based approach. Springer [10.1007/978-3-030-33223-5_16].
			
	Tutti gli autori
	
						Ben Hamadou, H.; Gallinucci, E.; Golfarelli, M.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/721804

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

18

14

ND

CRIS Current Research Information System