Neuroendocrine neoplasms (NENs) are rare and heterogeneous malignancies requiring multidisciplinary management. Large language models (LLMs) are emerging as decision-support tools, but their role in therapeutic decision-making is largely unexplored. ARTEMIS was a pilot cross-sectional study comparing three configurations—a baseline GPT, a customised GPT with static domain knowledge (GPTs), and a retrieval-augmented GPT (RAG)—against a panel of nine Italian NEN experts using twenty simulated, non-surgical cases. The primary endpoint was non-inferiority for systemic therapy recommendations; secondary endpoints included completeness, explicit uncertainty, parsimony of additional tests, costs, and variability metrics. RAG and GPTs achieved 70.0% agreement versus the expert benchmark (63.8%), meeting the exploratory –10% non-inferiority margin but not the stricter –5% threshold. Baseline GPT reached 60.0% and was not non-inferior. All AI systems consistently produced complete recommendations and expressed uncertainty more often than experts; RAG tended to propose fewer additional tests and lower associated costs. Experts showed greater variability than AI systems, and Ki-67 correlated with disagreement, indicating biological aggressiveness as a source of uncertainty. This exploratory study suggests that LLMs can approximate expert therapeutic reasoning under controlled conditions, but concordance remains limited and external validation in real-world settings is needed before clinical use.

Lamberti, G., Panzuto, F., Massironi, S., Cives, M., La Salvia, A., Spada, F., et al. (2026). ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms. NPJ DIGITAL MEDICINE, 9(1), 1-10 [10.1038/s41746-025-02274-x].

ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms

Lamberti G.
Primo
;
Andrini E.;Ricci C.;Campana D.
Ultimo
2026

Abstract

Neuroendocrine neoplasms (NENs) are rare and heterogeneous malignancies requiring multidisciplinary management. Large language models (LLMs) are emerging as decision-support tools, but their role in therapeutic decision-making is largely unexplored. ARTEMIS was a pilot cross-sectional study comparing three configurations—a baseline GPT, a customised GPT with static domain knowledge (GPTs), and a retrieval-augmented GPT (RAG)—against a panel of nine Italian NEN experts using twenty simulated, non-surgical cases. The primary endpoint was non-inferiority for systemic therapy recommendations; secondary endpoints included completeness, explicit uncertainty, parsimony of additional tests, costs, and variability metrics. RAG and GPTs achieved 70.0% agreement versus the expert benchmark (63.8%), meeting the exploratory –10% non-inferiority margin but not the stricter –5% threshold. Baseline GPT reached 60.0% and was not non-inferior. All AI systems consistently produced complete recommendations and expressed uncertainty more often than experts; RAG tended to propose fewer additional tests and lower associated costs. Experts showed greater variability than AI systems, and Ki-67 correlated with disagreement, indicating biological aggressiveness as a source of uncertainty. This exploratory study suggests that LLMs can approximate expert therapeutic reasoning under controlled conditions, but concordance remains limited and external validation in real-world settings is needed before clinical use.
2026
Lamberti, G., Panzuto, F., Massironi, S., Cives, M., La Salvia, A., Spada, F., et al. (2026). ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms. NPJ DIGITAL MEDICINE, 9(1), 1-10 [10.1038/s41746-025-02274-x].
Lamberti, G.; Panzuto, F.; Massironi, S.; Cives, M.; La Salvia, A.; Spada, F.; Faggiano, A.; Pusceddu, S.; Albertelli, M.; Tafuto, S.; Andrini, E.; Ri...espandi
File in questo prodotto:
File Dimensione Formato  
s41746-025-02274-x.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 672.28 kB
Formato Adobe PDF
672.28 kB Adobe PDF Visualizza/Apri
41746_2025_2274_MOESM1_ESM (1).pdf

accesso aperto

Tipo: File Supplementare
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND)
Dimensione 84.18 kB
Formato Adobe PDF
84.18 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1048138
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact