CRIS Current Research Information System

Neuroendocrine neoplasms (NENs) are rare and heterogeneous malignancies requiring multidisciplinary management. Large language models (LLMs) are emerging as decision-support tools, but their role in therapeutic decision-making is largely unexplored. ARTEMIS was a pilot cross-sectional study comparing three configurations—a baseline GPT, a customised GPT with static domain knowledge (GPTs), and a retrieval-augmented GPT (RAG)—against a panel of nine Italian NEN experts using twenty simulated, non-surgical cases. The primary endpoint was non-inferiority for systemic therapy recommendations; secondary endpoints included completeness, explicit uncertainty, parsimony of additional tests, costs, and variability metrics. RAG and GPTs achieved 70.0% agreement versus the expert benchmark (63.8%), meeting the exploratory –10% non-inferiority margin but not the stricter –5% threshold. Baseline GPT reached 60.0% and was not non-inferior. All AI systems consistently produced complete recommendations and expressed uncertainty more often than experts; RAG tended to propose fewer additional tests and lower associated costs. Experts showed greater variability than AI systems, and Ki-67 correlated with disagreement, indicating biological aggressiveness as a source of uncertainty. This exploratory study suggests that LLMs can approximate expert therapeutic reasoning under controlled conditions, but concordance remains limited and external validation in real-world settings is needed before clinical use.

Lamberti, G., Panzuto, F., Massironi, S., Cives, M., La Salvia, A., Spada, F., et al. (2026). ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms. NPJ DIGITAL MEDICINE, 9(1), 1-10 [10.1038/s41746-025-02274-x].

ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms

Lamberti G.^Primo;Panzuto F.;Massironi S.;Cives M.;La Salvia A.;Spada F.;Faggiano A.;Pusceddu S.;Albertelli M.;Tafuto S.;Andrini E.;Ricci C.;Campana D.^Ultimo

2026

Abstract

Neuroendocrine neoplasms (NENs) are rare and heterogeneous malignancies requiring multidisciplinary management. Large language models (LLMs) are emerging as decision-support tools, but their role in therapeutic decision-making is largely unexplored. ARTEMIS was a pilot cross-sectional study comparing three configurations—a baseline GPT, a customised GPT with static domain knowledge (GPTs), and a retrieval-augmented GPT (RAG)—against a panel of nine Italian NEN experts using twenty simulated, non-surgical cases. The primary endpoint was non-inferiority for systemic therapy recommendations; secondary endpoints included completeness, explicit uncertainty, parsimony of additional tests, costs, and variability metrics. RAG and GPTs achieved 70.0% agreement versus the expert benchmark (63.8%), meeting the exploratory –10% non-inferiority margin but not the stricter –5% threshold. Baseline GPT reached 60.0% and was not non-inferior. All AI systems consistently produced complete recommendations and expressed uncertainty more often than experts; RAG tended to propose fewer additional tests and lower associated costs. Experts showed greater variability than AI systems, and Ki-67 correlated with disagreement, indicating biological aggressiveness as a source of uncertainty. This exploratory study suggests that LLMs can approximate expert therapeutic reasoning under controlled conditions, but concordance remains limited and external validation in real-world settings is needed before clinical use.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				NPJ DIGITAL MEDICINE
			
	Codice DOI
	
				https://dx.doi.org/10.1038/s41746-025-02274-x
			
	Citazione
	
				Lamberti, G., Panzuto, F., Massironi, S., Cives, M., La Salvia, A., Spada, F., et al. (2026). ARTEMIS: a pilot study comparing AI-based and expert therapeutic decisions in simulated clinical cases of neuroendocrine neoplasms. NPJ DIGITAL MEDICINE, 9(1), 1-10 [10.1038/s41746-025-02274-x].
			
	Tutti gli autori
	
						Lamberti, G.; Panzuto, F.; Massironi, S.; Cives, M.; La Salvia, A.; Spada, F.; Faggiano, A.; Pusceddu, S.; Albertelli, M.; Tafuto, S.; Andrini, E.; Ri...espandi
						
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s41746-025-02274-x.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 672.28 kB Formato Adobe PDF Visualizza/Apri	672.28 kB	Adobe PDF	Visualizza/Apri
41746_2025_2274_MOESM1_ESM (1).pdf accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 84.18 kB Formato Adobe PDF Visualizza/Apri	84.18 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1048138

Citazioni

1

0

0

ND

social impact