The rapid growth of clinical data, driven by advances in medical research and digital health technologies, presents major challenges in ensuring data interoperability, standardization, and management. Standards such as Fast Healthcare Interoperability Resources (FHIR) are essential for enabling seamless data exchange. However, converting raw data into standardized formats remains a complex and resource-intensive task. Existing solutions often rely on manual processes or rigid rule-based systems, which are time-consuming, error-prone, and difficult to scale. To address these limitations, we propose a modular framework that employs Natural Language Processing (NLP) to streamline the FHIR mapping. Additionally, it integrates Large Language Model Operations (LLMOps) principles to automate and monitor the lifecycle of models involved in the data transformation. The framework consists of three core modules: (1) extraction of relevant clinical variables from heterogeneous data sources, (2) validation for anomaly detection and compliance with healthcare standards, and (3) assisted mapping of variables to FHIR resources. We evaluate the framework by applying it to an existing clinical data harmonization pipeline. Compared to the baseline process, our approach achieves a 59% reduction in time. This result underscores the potential of NLP-assisted frameworks to improve scalability, reliability, and efficiency in clinical data standardization

Marfoglia, A., Robustelli, A., D’Errico, C., Mellone, S., Carbonaro, A. (2025). A LLMOps-Driven Framework for Clinical Data Harmonization. Aachen : Ceur Workshop proceedings.

A LLMOps-Driven Framework for Clinical Data Harmonization

Alberto Marfoglia
;
Antonio Robustelli;Christian D’Errico;Sabato Mellone;Antonella Carbonaro
2025

Abstract

The rapid growth of clinical data, driven by advances in medical research and digital health technologies, presents major challenges in ensuring data interoperability, standardization, and management. Standards such as Fast Healthcare Interoperability Resources (FHIR) are essential for enabling seamless data exchange. However, converting raw data into standardized formats remains a complex and resource-intensive task. Existing solutions often rely on manual processes or rigid rule-based systems, which are time-consuming, error-prone, and difficult to scale. To address these limitations, we propose a modular framework that employs Natural Language Processing (NLP) to streamline the FHIR mapping. Additionally, it integrates Large Language Model Operations (LLMOps) principles to automate and monitor the lifecycle of models involved in the data transformation. The framework consists of three core modules: (1) extraction of relevant clinical variables from heterogeneous data sources, (2) validation for anomaly detection and compliance with healthcare standards, and (3) assisted mapping of variables to FHIR resources. We evaluate the framework by applying it to an existing clinical data harmonization pipeline. Compared to the baseline process, our approach achieves a 59% reduction in time. This result underscores the potential of NLP-assisted frameworks to improve scalability, reliability, and efficiency in clinical data standardization
2025
Machine Learning Operations 2025
25
36
Marfoglia, A., Robustelli, A., D’Errico, C., Mellone, S., Carbonaro, A. (2025). A LLMOps-Driven Framework for Clinical Data Harmonization. Aachen : Ceur Workshop proceedings.
Marfoglia, Alberto; Robustelli, Antonio; D’Errico, Christian; Mellone, Sabato; Carbonaro, Antonella
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1031436
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact