CRIS Current Research Information System

Multistores are data management systems that enable query processing across different and heterogeneous databases; besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. Our multistore solution relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by formulating and evaluating different execution plans on the multistore. In particular, we outline different strategies to carry out joins and data fusion by relying on different schema representations; then, a self-learning black-box cost model is used to estimate execution times and select the most efficient plan. The experiments assess the effectiveness of the cost model in choosing the best execution plan for the given queries and exploit multiple multistore benchmarks to investigate the factors that influence the performance of different plans.

Forresi, C., Francia, M., Gallinucci, E., Golfarelli, M. (2023). Cost-based Optimization of Multistore Query Plans. INFORMATION SYSTEMS FRONTIERS, 25(5), 1925-1951 [10.1007/s10796-022-10320-2].

Cost-based Optimization of Multistore Query Plans

Forresi, C;Francia, M;Gallinucci, E;Golfarelli, M

2023

Abstract

Multistores are data management systems that enable query processing across different and heterogeneous databases; besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. Our multistore solution relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by formulating and evaluating different execution plans on the multistore. In particular, we outline different strategies to carry out joins and data fusion by relying on different schema representations; then, a self-learning black-box cost model is used to estimate execution times and select the most efficient plan. The experiments assess the effectiveness of the cost model in choosing the best execution plan for the given queries and exploit multiple multistore benchmarks to investigate the factors that influence the performance of different plans.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				INFORMATION SYSTEMS FRONTIERS
			
	Codice DOI
	
				https://dx.doi.org/10.1007/s10796-022-10320-2
			
	Citazione
	
				Forresi, C., Francia, M., Gallinucci, E., Golfarelli, M. (2023). Cost-based Optimization of Multistore Query Plans. INFORMATION SYSTEMS FRONTIERS, 25(5), 1925-1951 [10.1007/s10796-022-10320-2].
			
	Tutti gli autori
	
						Forresi, C; Francia, M; Gallinucci, E; Golfarelli, M
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s10796-022-10320-2.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 4 MB Formato Adobe PDF Visualizza/Apri	4 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/897260

Citazioni

ND

5

3

social impact