The exponential amount of geospatial data that has been accumulated in an accelerated pace has inevitably motivated the scientific community to examine novel parallel technologies for tuning the performance of spatial queries. Managing spatial data for an optimized query performance is particularly a challenging task. This is due to the growing complexity of geometric computations involved in querying spatial data, where traditional systems failed to beneficially expand. However, the use of large-scale and parallel-based computing infrastructures based on cost-effective commodity clusters and cloud computing environments introduces new management challenges to avoid bottlenecks such as overloading scarce computing resources, which may be caused by an unbalanced loading of parallel tasks. In this paper, we aim to fill those gaps by introducing a generic framework for optimizing the performance of big spatial data queries on top of Apache Spark. Our framework also supports advanced management functions including a unique self-adaptable load-balancing service to self-tune framework execution. Our experimental evaluation shows that our framework is scalable and efficient for querying massive amounts of real spatial datasets.

Efficient spark-based framework for big geospatial data query processing and analysis

Al Jawarneh, Isam Mashhour;Bellavista, Paolo;Corradi, Antonio;Montanari, Rebecca;Foschini, Luca;
2017

Abstract

The exponential amount of geospatial data that has been accumulated in an accelerated pace has inevitably motivated the scientific community to examine novel parallel technologies for tuning the performance of spatial queries. Managing spatial data for an optimized query performance is particularly a challenging task. This is due to the growing complexity of geometric computations involved in querying spatial data, where traditional systems failed to beneficially expand. However, the use of large-scale and parallel-based computing infrastructures based on cost-effective commodity clusters and cloud computing environments introduces new management challenges to avoid bottlenecks such as overloading scarce computing resources, which may be caused by an unbalanced loading of parallel tasks. In this paper, we aim to fill those gaps by introducing a generic framework for optimizing the performance of big spatial data queries on top of Apache Spark. Our framework also supports advanced management functions including a unique self-adaptable load-balancing service to self-tune framework execution. Our experimental evaluation shows that our framework is scalable and efficient for querying massive amounts of real spatial datasets.
2017
Proceedings - IEEE Symposium on Computers and Communications
851
856
Al Jawarneh, Isam Mashhour; Bellavista, Paolo; Corradi, Antonio; Montanari, Rebecca; Foschini, Luca; Zanotti, Andrea
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/619484
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 16
social impact