Data modeling in NoSQL databases is notoriously complex and driven by multiple and possibly conflicting requirements. Researchers have proposed methodologies to optimize schema design of a given domain for a given workload; however, due to the agile environment in which NoSQL databases are usually employed, both domain and workload are frequently subject to changes and evolution - possibly neutralizing the benefits of optimization. When this happens, the benefits of a new optimal schema design must be weighed against the costs of migrating the data. In this work, we empirically show the benefits of schema redesign in a real publicly available database. In particular, we identify multiple snapshots (in terms of domain extension and querying workload) in the 20+ years evolution of SkyServer, demonstrate how NoSQL schema optimization at a given time can later backfire, and evaluate the conditions under which data migration becomes beneficial. This takes us to define the foundations and challenges of a framework for continuous NoSQL database refactoring, with the goal of helping DBAs and data engineers decide if, when, and how a NoSQL database should be reconsidered to restore schema design optimality.
Gallinucci, E., Golfarelli, M., Radwan, W., Zarate, G., Abello, A. (2025). Impact Study of NoSQL Refactoring in SkyServer Database. CEUR-WS.
Impact Study of NoSQL Refactoring in SkyServer Database
Gallinucci E.
;Golfarelli M.;
2025
Abstract
Data modeling in NoSQL databases is notoriously complex and driven by multiple and possibly conflicting requirements. Researchers have proposed methodologies to optimize schema design of a given domain for a given workload; however, due to the agile environment in which NoSQL databases are usually employed, both domain and workload are frequently subject to changes and evolution - possibly neutralizing the benefits of optimization. When this happens, the benefits of a new optimal schema design must be weighed against the costs of migrating the data. In this work, we empirically show the benefits of schema redesign in a real publicly available database. In particular, we identify multiple snapshots (in terms of domain extension and querying workload) in the 20+ years evolution of SkyServer, demonstrate how NoSQL schema optimization at a given time can later backfire, and evaluate the conditions under which data migration becomes beneficial. This takes us to define the foundations and challenges of a framework for continuous NoSQL database refactoring, with the goal of helping DBAs and data engineers decide if, when, and how a NoSQL database should be reconsidered to restore schema design optimality.| File | Dimensione | Formato | |
|---|---|---|---|
|
paper1.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
2.09 MB
Formato
Adobe PDF
|
2.09 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


