A data lake stores heterogeneous big data, in their native format, without any predefined schema, while providing supports for querying and analyzing such big data. Metadata are necessary for describing the big data stored in the data lake, and metadata management and querying are among the most im-portant functionalities of a data lake management system. However, although metadata are temporal by their nature, existing metadata models for data lakes do not provide support for managing the evolution over time of metadata; the conventional metadata versioning that is supported by some of these models does not timestamp data versions and does manage these ver-sions according to the rules and operations already defined in the temporal database field for the management of time-varying data. For these reasons, we propose in this paper a temporal metadata management approach for data lakes. This approach is based on a temporal metadata model for data lakes, named T-goldMEDAL, defined as a temporal extension of the conventional metadata model goldMEDAL; this latter has been chosen since it is the most generic/abstract and flexible model among those published in the literature of data lakes. Moreover, to make our model useful, we complete our approach with the proposal of a temporal query language, named QL4-T-goldMEDAL, for querying temporal metadata in a T-goldMEDAL data lake.

Brahmia, S., Brahmia, Z., Grandi, F., Bouaziz, R. (2024). A Temporal Metadata Management Approach for Data Lakes. Cham : Springer Nature [10.1007/978-3-031-65018-5_4].

A Temporal Metadata Management Approach for Data Lakes

Grandi, Fabio;
2024

Abstract

A data lake stores heterogeneous big data, in their native format, without any predefined schema, while providing supports for querying and analyzing such big data. Metadata are necessary for describing the big data stored in the data lake, and metadata management and querying are among the most im-portant functionalities of a data lake management system. However, although metadata are temporal by their nature, existing metadata models for data lakes do not provide support for managing the evolution over time of metadata; the conventional metadata versioning that is supported by some of these models does not timestamp data versions and does manage these ver-sions according to the rules and operations already defined in the temporal database field for the management of time-varying data. For these reasons, we propose in this paper a temporal metadata management approach for data lakes. This approach is based on a temporal metadata model for data lakes, named T-goldMEDAL, defined as a temporal extension of the conventional metadata model goldMEDAL; this latter has been chosen since it is the most generic/abstract and flexible model among those published in the literature of data lakes. Moreover, to make our model useful, we complete our approach with the proposal of a temporal query language, named QL4-T-goldMEDAL, for querying temporal metadata in a T-goldMEDAL data lake.
2024
Artificial Intelligence, Big Data, IOT and Block Chain in Healthcare: From Concepts to Applications - Volume 2
35
44
Brahmia, S., Brahmia, Z., Grandi, F., Bouaziz, R. (2024). A Temporal Metadata Management Approach for Data Lakes. Cham : Springer Nature [10.1007/978-3-031-65018-5_4].
Brahmia, Safa; Brahmia, Zouhaier; Grandi, Fabio; Bouaziz, Rafik
File in questo prodotto:
File Dimensione Formato  
ICBDBI2024_2-frontmatter.pdf

accesso aperto

Descrizione: frontmatter
Tipo: File Supplementare
Licenza: Licenza per accesso libero gratuito
Dimensione 125.05 kB
Formato Adobe PDF
125.05 kB Adobe PDF Visualizza/Apri
ICBDBI2024_2_Accepted.pdf

embargo fino al 17/08/2025

Descrizione: accepted-version
Tipo: Postprint
Licenza: Licenza per accesso libero gratuito
Dimensione 916.13 kB
Formato Adobe PDF
916.13 kB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/978515
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact