The effectiveness of data-driven solutions, in Data Science as well as in Machine Learning, clearly depends on the quality and interpretability of the underlying data. Unfortunately, real-world data is often incomplete, inconsistent, biased, or lacks adequate semantic description. Traditional data preparation workflows typically rely on ad hoc methods, limited automation, and minimal consideration of domain knowledge, resulting in inefficiencies and unreliable analytical outcomes. The S-PIC4CHU project proposes embedding semantics at the core of each stage of the process: a novel architecture for data preparation, grounded in a semantics-based methodology that supports provenance tracking, integrity enforcement, and fairness assessment. This paradigm shift is based on the design and implementation of a semantically-aware Data Preparation Pipeline (DPP), integrated with a corresponding Semantic Transformation Pipeline (STP): each data transformation step is semantically annotated through mappings to ontologies and knowledge graphs, enabling enhanced traceability, transparency, and reasoning over data.

Alfano, G., Bartolini, I., Calvanese, D., Ciaccia, P., Greco, S., Lanti, D., et al. (2025). The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.. CEUR-WS.

The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.

Ilaria Bartolini
;
Paolo Ciaccia
;
Marco Patella
;
2025

Abstract

The effectiveness of data-driven solutions, in Data Science as well as in Machine Learning, clearly depends on the quality and interpretability of the underlying data. Unfortunately, real-world data is often incomplete, inconsistent, biased, or lacks adequate semantic description. Traditional data preparation workflows typically rely on ad hoc methods, limited automation, and minimal consideration of domain knowledge, resulting in inefficiencies and unreliable analytical outcomes. The S-PIC4CHU project proposes embedding semantics at the core of each stage of the process: a novel architecture for data preparation, grounded in a semantics-based methodology that supports provenance tracking, integrity enforcement, and fairness assessment. This paradigm shift is based on the design and implementation of a semantically-aware Data Preparation Pipeline (DPP), integrated with a corresponding Semantic Transformation Pipeline (STP): each data transformation step is semantically annotated through mappings to ontologies and knowledge graphs, enabling enhanced traceability, transparency, and reasoning over data.
2025
Proceedings of the 33rd Symposium on Advanced Database Systems
423
433
Alfano, G., Bartolini, I., Calvanese, D., Ciaccia, P., Greco, S., Lanti, D., et al. (2025). The S-PIC4CHU Project - Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, Unbiased Data Science.. CEUR-WS.
Alfano, Gianvincenzo; Bartolini, Ilaria; Calvanese, Diego; Ciaccia, Paolo; Greco, Sergio; Lanti, Davide; Lenzi, Emilia; Martinenghi, Davide; Molinaro,...espandi
File in questo prodotto:
File Dimensione Formato  
SEBD 2025-mainRPE.pdf

accesso aperto

Tipo: File Supplementare
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 914.85 kB
Formato Adobe PDF
914.85 kB Adobe PDF Visualizza/Apri
paper22.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.1 MB
Formato Adobe PDF
1.1 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1050064
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact