A data lake stores heterogeneous big data, in their native format, without any predefined schema, while providing supports for querying and analyzing such big data. Metadata are necessary for describing the big data stored in the data lake, and metadata management and querying are among the most im-portant functionalities of a data lake management system. However, although metadata are temporal by their nature, existing metadata models for data lakes do not provide support for managing the evolution over time of metadata; the conventional metadata versioning that is supported by some of these models does not timestamp data versions and does manage these ver-sions according to the rules and operations already defined in the temporal database field for the management of time-varying data. For these reasons, we propose in this paper a temporal metadata management approach for data lakes. This approach is based on a temporal metadata model for data lakes, named T-goldMEDAL, defined as a temporal extension of the conventional metadata model goldMEDAL; this latter has been chosen since it is the most generic/abstract and flexible model among those published in the literature of data lakes. Moreover, to make our model useful, we complete our approach with the proposal of a temporal query language, named QL4-T-goldMEDAL, for querying temporal metadata in a T-goldMEDAL data lake.
Brahmia, S., Brahmia, Z., Grandi, F., Bouaziz, R. (2024). A Temporal Metadata Management Approach for Data Lakes. Cham : Springer Nature [10.1007/978-3-031-65018-5_4].
A Temporal Metadata Management Approach for Data Lakes
Grandi, Fabio;
2024
Abstract
A data lake stores heterogeneous big data, in their native format, without any predefined schema, while providing supports for querying and analyzing such big data. Metadata are necessary for describing the big data stored in the data lake, and metadata management and querying are among the most im-portant functionalities of a data lake management system. However, although metadata are temporal by their nature, existing metadata models for data lakes do not provide support for managing the evolution over time of metadata; the conventional metadata versioning that is supported by some of these models does not timestamp data versions and does manage these ver-sions according to the rules and operations already defined in the temporal database field for the management of time-varying data. For these reasons, we propose in this paper a temporal metadata management approach for data lakes. This approach is based on a temporal metadata model for data lakes, named T-goldMEDAL, defined as a temporal extension of the conventional metadata model goldMEDAL; this latter has been chosen since it is the most generic/abstract and flexible model among those published in the literature of data lakes. Moreover, to make our model useful, we complete our approach with the proposal of a temporal query language, named QL4-T-goldMEDAL, for querying temporal metadata in a T-goldMEDAL data lake.File | Dimensione | Formato | |
---|---|---|---|
ICBDBI2024_2-frontmatter.pdf
accesso aperto
Descrizione: frontmatter
Tipo:
File Supplementare
Licenza:
Licenza per accesso libero gratuito
Dimensione
125.05 kB
Formato
Adobe PDF
|
125.05 kB | Adobe PDF | Visualizza/Apri |
ICBDBI2024_2_Accepted.pdf
embargo fino al 17/08/2025
Descrizione: accepted-version
Tipo:
Postprint
Licenza:
Licenza per accesso libero gratuito
Dimensione
916.13 kB
Formato
Adobe PDF
|
916.13 kB | Adobe PDF | Visualizza/Apri Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.