Synthetic data is becoming an essential tool for overcoming data scarcity, class imbalance, and privacy concerns in all research fields, including the biomedical one. We propose Conditional Flow Matching (CFM) as a unified and efficient generative framework applicable across diverse biomedical modalities. CFM leverages conditional optimal transport to model complex data distributions, while maintaining architectural simplicity and computational efficiency. We evaluate CFM on three representative tasks of increasing complexity in data structure. We show applications to the following case studies: (i) mixed type tabular data from Acute Myeloid Leukemia patient cohort, including genomic landscape and survival data; (ii) standard 2D RGB biomedical images belonging to discrete classes, given by slit lamp eye images stratified according to conjunctival hyperemia; (iii) 3D Computed Tomography chest volumes for lung segmentation. Across these use cases, CFM generates high-fidelity, anatomically and semantically consistent samples, validated according to ad hoc metrics and pipelines. Despite some modality-specific limitations, our results highlight CFM’s versatility and potential as a general-purpose synthetic data generation framework for healthcare and biomedical domains.

Giacometti, T., Curti, N., Zaghi, A., Remondini, D., Castellani, G. (2025). Flow-Based Synthetic Data Generation: A Unified Approach for Biomedical Tasks. Cham : Springer Nature [10.1007/978-3-032-17216-7_12].

Flow-Based Synthetic Data Generation: A Unified Approach for Biomedical Tasks

Tommaso Giacometti
Primo
;
Nico Curti
;
Adriano Zaghi;Daniel Remondini
Penultimo
;
Gastone Castellani
Ultimo
2025

Abstract

Synthetic data is becoming an essential tool for overcoming data scarcity, class imbalance, and privacy concerns in all research fields, including the biomedical one. We propose Conditional Flow Matching (CFM) as a unified and efficient generative framework applicable across diverse biomedical modalities. CFM leverages conditional optimal transport to model complex data distributions, while maintaining architectural simplicity and computational efficiency. We evaluate CFM on three representative tasks of increasing complexity in data structure. We show applications to the following case studies: (i) mixed type tabular data from Acute Myeloid Leukemia patient cohort, including genomic landscape and survival data; (ii) standard 2D RGB biomedical images belonging to discrete classes, given by slit lamp eye images stratified according to conjunctival hyperemia; (iii) 3D Computed Tomography chest volumes for lung segmentation. Across these use cases, CFM generates high-fidelity, anatomically and semantically consistent samples, validated according to ad hoc metrics and pipelines. Despite some modality-specific limitations, our results highlight CFM’s versatility and potential as a general-purpose synthetic data generation framework for healthcare and biomedical domains.
2025
Artificial Intelligence for Biomedical Data
149
156
Giacometti, T., Curti, N., Zaghi, A., Remondini, D., Castellani, G. (2025). Flow-Based Synthetic Data Generation: A Unified Approach for Biomedical Tasks. Cham : Springer Nature [10.1007/978-3-032-17216-7_12].
Giacometti, Tommaso; Curti, Nico; Zaghi, Adriano; Remondini, Daniel; Castellani, Gastone
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1046380
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact