The effectiveness of data-driven solutions, in Data Science as well as in Machine Learning, clearly depends on the quality and interpretability of the underlying data. Unfortunately, real-world data is often incomplete, inconsistent, biased, or lacks adequate semantic description. Traditional data preparation workflows typically rely on ad hoc methods, limited automation, and minimal consideration of domain knowledge, resulting in inefficiencies and unreliable analytical outcomes. The S-PIC4CHU project proposes embedding semantics at the core of each stage of the process: a novel architecture for data preparation, grounded in a semantics-based methodology that supports provenance tracking, integrity enforcement, and fairness assessment. This paradigm shift is based on the design and implementation of a semantically-aware Data Preparation Pipeline (DPP), integrated with a corresponding Semantic Transformation Pipeline (STP): each data transformation step is semantically annotated through mappings to ontologies and knowledge graphs, enabling enhanced traceability, transparency, and reasoning over data.

Alfano, G., Bartolini, I., Calvanese, D., Ciaccia, P., Greco, S., Lanti, D., et al. (2025). Research Project Exhibition Track SEBD 2025: The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.. CEUR-WS.

Research Project Exhibition Track SEBD 2025: The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.

Ilaria Bartolini
;
Paolo Ciaccia
;
Marco Patella
;
2025

Abstract

The effectiveness of data-driven solutions, in Data Science as well as in Machine Learning, clearly depends on the quality and interpretability of the underlying data. Unfortunately, real-world data is often incomplete, inconsistent, biased, or lacks adequate semantic description. Traditional data preparation workflows typically rely on ad hoc methods, limited automation, and minimal consideration of domain knowledge, resulting in inefficiencies and unreliable analytical outcomes. The S-PIC4CHU project proposes embedding semantics at the core of each stage of the process: a novel architecture for data preparation, grounded in a semantics-based methodology that supports provenance tracking, integrity enforcement, and fairness assessment. This paradigm shift is based on the design and implementation of a semantically-aware Data Preparation Pipeline (DPP), integrated with a corresponding Semantic Transformation Pipeline (STP): each data transformation step is semantically annotated through mappings to ontologies and knowledge graphs, enabling enhanced traceability, transparency, and reasoning over data.
2025
Proceedings of the 33rd Symposium on Advanced Database Systems
735
751
Alfano, G., Bartolini, I., Calvanese, D., Ciaccia, P., Greco, S., Lanti, D., et al. (2025). Research Project Exhibition Track SEBD 2025: The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.. CEUR-WS.
Alfano, Gianvincenzo; Bartolini, Ilaria; Calvanese, Diego; Ciaccia, Paolo; Greco, Sergio; Lanti, Davide; Lenzi, Emilia; Martinenghi, Davide; Molinaro,...espandi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1050064
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact