This paper presents the vision of the S-PIC4CHU project, which aims to develop innovative models and techniques for scalable data preparation in Data Science and Machine Learning. The project focuses on leveraging data semantics throughout all data preparation stages to improve data quality and ensure unbiased results. The proposed approach involves a novel data preparation pipeline semantically enriched with domain knowledge from ontologies and knowledge graphs, along with novel, semanticbased techniques for data cleaning, integration, provenance, explanation, and quality management. The validation of the approach relies on use cases from different domains, with the goal of releasing open-source tools.
Alfano, G., Bartolini, I., Calvanese, D., Ciaccia, P., Greco, S., Lanti, D., et al. (2025). S-PIC4CHU: Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, and Unbiased Data Science. CEUR-WS.
S-PIC4CHU: Semantics-based Provenance, Integrity, and Curation for Consistent, High-quality, and Unbiased Data Science
Ilaria Bartolini
;Paolo Ciaccia
;Marco Patella
;
2025
Abstract
This paper presents the vision of the S-PIC4CHU project, which aims to develop innovative models and techniques for scalable data preparation in Data Science and Machine Learning. The project focuses on leveraging data semantics throughout all data preparation stages to improve data quality and ensure unbiased results. The proposed approach involves a novel data preparation pipeline semantically enriched with domain knowledge from ontologies and knowledge graphs, along with novel, semanticbased techniques for data cleaning, integration, provenance, explanation, and quality management. The validation of the approach relies on use cases from different domains, with the goal of releasing open-source tools.| File | Dimensione | Formato | |
|---|---|---|---|
|
S_PIC4CHU_SEBD-6.pdf
accesso aperto
Tipo:
Versione (PDF) editoriale / Version Of Record
Licenza:
Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione
227.69 kB
Formato
Adobe PDF
|
227.69 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


