Metadata Management in Data Lake Environments: a Survey

Boukraaa, Doulkifli; Balab, Mahfoud; Rizzi, Stefano

doi:10.1080/19386389.2024.2359310

Data lakes are storage repositories that contain large amounts of data in its native format; either structured ssemi-structured or unstructured, to be used when needed. Data lakes are open to a wide range of use cases such as carrying out advanced analytics, extracting knowledge patterns, etc. However, simply dumping all the data into a data lake would only lead to a so-called data swamp. To prevent such a situation, enterprises can adopt best practices among which to build and maintain metadata. In recent years there has been a growing body of research about managing metadata in data lake environments. Existing research efforts deal separately with different activities such as metadata modeling, metadata capture and extraction, metadata usage, etc. Nevertheless, despite its importance, a global view about the research landscape about metadata management for data lakes is still missing. This survey congre- gates different facets of metadata management in data lakes and presents a global view along with the technological impli- cations and the required features for building successful meta- data management systems. Besides, this survey summarizes and discusses research gaps, open problems and main chal- lenges facing both industrialists and academics. This survey pertains to the broader field of Big Data and especially to the data platforms that manage enterprise big data assets. Furthermore, considering the parallels between data lakes and digital libraries regarding their dependence on metadata for content management, this study could offer valuable insights to the digital library community, offering them a technological outlook on metadata management.

Boukraaa, D., Balab, M., Rizzi, S. (2024). Metadata Management in Data Lake Environments: a Survey. JOURNAL OF LIBRARY METADATA, 24(4), 215-274 [10.1080/19386389.2024.2359310].

Metadata Management in Data Lake Environments: a Survey

Doulkifli Boukraaa;Mahfoud Balab;Stefano Rizzi

2024

Abstract

Data lakes are storage repositories that contain large amounts of data in its native format; either structured ssemi-structured or unstructured, to be used when needed. Data lakes are open to a wide range of use cases such as carrying out advanced analytics, extracting knowledge patterns, etc. However, simply dumping all the data into a data lake would only lead to a so-called data swamp. To prevent such a situation, enterprises can adopt best practices among which to build and maintain metadata. In recent years there has been a growing body of research about managing metadata in data lake environments. Existing research efforts deal separately with different activities such as metadata modeling, metadata capture and extraction, metadata usage, etc. Nevertheless, despite its importance, a global view about the research landscape about metadata management for data lakes is still missing. This survey congre- gates different facets of metadata management in data lakes and presents a global view along with the technological impli- cations and the required features for building successful meta- data management systems. Besides, this survey summarizes and discusses research gaps, open problems and main chal- lenges facing both industrialists and academics. This survey pertains to the broader field of Big Data and especially to the data platforms that manage enterprise big data assets. Furthermore, considering the parallels between data lakes and digital libraries regarding their dependence on metadata for content management, this study could offer valuable insights to the digital library community, offering them a technological outlook on metadata management.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				JOURNAL OF LIBRARY METADATA
			
	Codice DOI
	
				https://dx.doi.org/10.1080/19386389.2024.2359310
			
	Citazione
	
				Boukraaa, D., Balab, M., Rizzi, S. (2024). Metadata Management in Data Lake Environments: a Survey. JOURNAL OF LIBRARY METADATA, 24(4), 215-274 [10.1080/19386389.2024.2359310].
			
	Tutti gli autori
	
						Boukraaa, Doulkifli; Balab, Mahfoud; Rizzi, Stefano

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/997624

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

6

ND

ND

CRIS Current Research Information System