Data modeling in NoSQL databases is notoriously complex and driven by multiple and possibly conflicting requirements. Researchers have proposed methodologies to optimize schema design of a given domain for a given workload; however, due to the agile environment in which NoSQL databases are usually employed, both domain and workload are frequently subject to changes and evolution - possibly neutralizing the benefits of optimization. When this happens, the benefits of a new optimal schema design must be weighed against the costs of migrating the data. In this work, we empirically show the benefits of schema redesign in a real publicly available database. In particular, we identify multiple snapshots (in terms of domain extension and querying workload) in the 20+ years evolution of SkyServer, demonstrate how NoSQL schema optimization at a given time can later backfire, and evaluate the conditions under which data migration becomes beneficial. This takes us to define the foundations and challenges of a framework for continuous NoSQL database refactoring, with the goal of helping DBAs and data engineers decide if, when, and how a NoSQL database should be reconsidered to restore schema design optimality.

Gallinucci, E., Golfarelli, M., Radwan, W., Zarate, G., Abello, A. (2025). Impact Study of NoSQL Refactoring in SkyServer Database. CEUR-WS.

Impact Study of NoSQL Refactoring in SkyServer Database

Gallinucci E.
;
Golfarelli M.;
2025

Abstract

Data modeling in NoSQL databases is notoriously complex and driven by multiple and possibly conflicting requirements. Researchers have proposed methodologies to optimize schema design of a given domain for a given workload; however, due to the agile environment in which NoSQL databases are usually employed, both domain and workload are frequently subject to changes and evolution - possibly neutralizing the benefits of optimization. When this happens, the benefits of a new optimal schema design must be weighed against the costs of migrating the data. In this work, we empirically show the benefits of schema redesign in a real publicly available database. In particular, we identify multiple snapshots (in terms of domain extension and querying workload) in the 20+ years evolution of SkyServer, demonstrate how NoSQL schema optimization at a given time can later backfire, and evaluate the conditions under which data migration becomes beneficial. This takes us to define the foundations and challenges of a framework for continuous NoSQL database refactoring, with the goal of helping DBAs and data engineers decide if, when, and how a NoSQL database should be reconsidered to restore schema design optimality.
2025
Proceedings of the 27th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2025
1
11
Gallinucci, E., Golfarelli, M., Radwan, W., Zarate, G., Abello, A. (2025). Impact Study of NoSQL Refactoring in SkyServer Database. CEUR-WS.
Gallinucci, E.; Golfarelli, M.; Radwan, W.; Zarate, G.; Abello, A.
File in questo prodotto:
File Dimensione Formato  
paper1.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 2.09 MB
Formato Adobe PDF
2.09 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1013631
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact