Objectives: Establishing the reproducibility of expert-derived measurements on CTA exams of aortic dissection is clinically important and paramount for ground-truth determination for machine learning. Methods: Four independent observers retrospectively evaluated CTA exams of 72 patients with uncomplicated Stanford type B aortic dissection and assessed the reproducibility of a recently proposed combination of four morphologic risk predictors (maximum aortic diameter, false lumen circumferential angle, false lumen outflow, and intercostal arteries). For the first inter-observer variability assessment, 47 CTA scans from one aortic center were evaluated by expert-observer 1 in an unconstrained clinical assessment without a standardized workflow and compared to a composite of three expert-observers (observers 2–4) using a standardized workflow. A second inter-observer variability assessment on 30 out of the 47 CTA scans compared observers 3 and 4 with a constrained, standardized workflow. A third inter-observer variability assessment was done after specialized training and tested between observers 3 and 4 in an external population of 25 CTA scans. Inter-observer agreement was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Results: Pre-training ICCs of the four morphologic features ranged from 0.04 (−0.05 to 0.13) to 0.68 (0.49–0.81) between observer 1 and observers 2–4 and from 0.50 (0.32–0.69) to 0.89 (0.78–0.95) between observers 3 and 4. ICCs improved after training ranging from 0.69 (0.52–0.87) to 0.97 (0.94–0.99), and Bland-Altman analysis showed decreased bias and limits of agreement. Conclusions: Manual morphologic feature measurements on CTA images can be optimized resulting in improved inter-observer reliability. This is essential for robust ground-truth determination for machine learning models. Key Points: • Clinical fashion manual measurements of aortic CTA imaging features showed poor inter-observer reproducibility. • A standardized workflow with standardized training resulted in substantial improvements with excellent inter-observer reproducibility. • Robust ground truth labels obtained manually with excellent inter-observer reproducibility are key to develop reliable machine learning models.

Willemink M.J., Mastrodicasa D., Madani M.H., Codari M., Chepelev L.L., Mistelbauer G., et al. (2022). Inter-observer variability of expert-derived morphologic risk predictors in aortic dissection. EUROPEAN RADIOLOGY, Online ahead of print, 1-10 [10.1007/s00330-022-09056-z].

Inter-observer variability of expert-derived morphologic risk predictors in aortic dissection

Pacini D.;Folesani G.;
2022

Abstract

Objectives: Establishing the reproducibility of expert-derived measurements on CTA exams of aortic dissection is clinically important and paramount for ground-truth determination for machine learning. Methods: Four independent observers retrospectively evaluated CTA exams of 72 patients with uncomplicated Stanford type B aortic dissection and assessed the reproducibility of a recently proposed combination of four morphologic risk predictors (maximum aortic diameter, false lumen circumferential angle, false lumen outflow, and intercostal arteries). For the first inter-observer variability assessment, 47 CTA scans from one aortic center were evaluated by expert-observer 1 in an unconstrained clinical assessment without a standardized workflow and compared to a composite of three expert-observers (observers 2–4) using a standardized workflow. A second inter-observer variability assessment on 30 out of the 47 CTA scans compared observers 3 and 4 with a constrained, standardized workflow. A third inter-observer variability assessment was done after specialized training and tested between observers 3 and 4 in an external population of 25 CTA scans. Inter-observer agreement was assessed with intraclass correlation coefficients (ICCs) and Bland-Altman plots. Results: Pre-training ICCs of the four morphologic features ranged from 0.04 (−0.05 to 0.13) to 0.68 (0.49–0.81) between observer 1 and observers 2–4 and from 0.50 (0.32–0.69) to 0.89 (0.78–0.95) between observers 3 and 4. ICCs improved after training ranging from 0.69 (0.52–0.87) to 0.97 (0.94–0.99), and Bland-Altman analysis showed decreased bias and limits of agreement. Conclusions: Manual morphologic feature measurements on CTA images can be optimized resulting in improved inter-observer reliability. This is essential for robust ground-truth determination for machine learning models. Key Points: • Clinical fashion manual measurements of aortic CTA imaging features showed poor inter-observer reproducibility. • A standardized workflow with standardized training resulted in substantial improvements with excellent inter-observer reproducibility. • Robust ground truth labels obtained manually with excellent inter-observer reproducibility are key to develop reliable machine learning models.
2022
Willemink M.J., Mastrodicasa D., Madani M.H., Codari M., Chepelev L.L., Mistelbauer G., et al. (2022). Inter-observer variability of expert-derived morphologic risk predictors in aortic dissection. EUROPEAN RADIOLOGY, Online ahead of print, 1-10 [10.1007/s00330-022-09056-z].
Willemink M.J.; Mastrodicasa D.; Madani M.H.; Codari M.; Chepelev L.L.; Mistelbauer G.; Hanneman K.; Ouzounian M.; Ocazionez D.; Afifi R.O.; Lacomis J...espandi
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/902748
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact