The rise of data platforms has enabled the collection and processing of huge volumes of data, but has opened to the risk of losing their control. Collecting proper metadata about raw data and transformations can significantly reduce this risk. In this paper we propose MOSES, a technology-agnostic, extensible, and customizable framework for metadata handling in big data platforms. The framework hinges on a metadata repository that stores information about the objects in the big data platform and the processes that transform them. MOSES provides a wide range of functionalities to different types of users of the platform. Differently from previous high-level proposals, MOSES is fully implemented and it was not conceived for a specific technology. Besides discussing the rationale and the features of MOSES, in this paper we describe its implementation and we test it on a real case study. The ultimate goal is to take a significant step forward towards proving that metadata handling in big data platforms is feasible and beneficial.

Making data platforms smarter with MOSES

Matteo Francia;Enrico Gallinucci;Matteo Golfarelli;Anna Giulia Leoni;Stefano Rizzi;Nicola Santolini
2021

Abstract

The rise of data platforms has enabled the collection and processing of huge volumes of data, but has opened to the risk of losing their control. Collecting proper metadata about raw data and transformations can significantly reduce this risk. In this paper we propose MOSES, a technology-agnostic, extensible, and customizable framework for metadata handling in big data platforms. The framework hinges on a metadata repository that stores information about the objects in the big data platform and the processes that transform them. MOSES provides a wide range of functionalities to different types of users of the platform. Differently from previous high-level proposals, MOSES is fully implemented and it was not conceived for a specific technology. Besides discussing the rationale and the features of MOSES, in this paper we describe its implementation and we test it on a real case study. The ultimate goal is to take a significant step forward towards proving that metadata handling in big data platforms is feasible and beneficial.
FUTURE GENERATION COMPUTER SYSTEMS
Matteo Francia, Enrico Gallinucci, Matteo Golfarelli, Anna Giulia Leoni, Stefano Rizzi, Nicola Santolini
File in questo prodotto:
File Dimensione Formato  
FGCS125-2021.pdf

embargo fino al 25/06/2023

Tipo: Postprint
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF   Visualizza/Apri   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11585/827383
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact