CRIS Current Research Information System

Patients undergoing oral surgery are frequently polymedicated and preoperative prescriptions (analgesics, corticosteroids, antibiotics) can generate clinically significant drug–drug interactions (DDIs) associated with bleeding risk, serotonin toxicity, cardiovascular instability and other adverse events. This study prospectively evaluated whether large language models (LLMs) can assist in detecting clinically relevant DDIs at the point of care. Five LLMs (ChatGPT-5, DeepSeek-Chat, DeepSeek-Reasoner, Gemini-Flash, and Gemini-Pro) were compared with a panel of experienced oral surgeons in 500 standardized oral-surgery cases constructed from realistic chronic medication profiles and typical postoperative regimens. For each case, all chronic and procedure-related drugs were provided and the task was to identify DDIs and rate their severity using an ordinal Lexicomp-based scale (A–X), with D/X considered “action required”. Primary outcomes were exact agreement with surgeon consensus and ordinal concordance; secondary outcomes included sensitivity for actionable DDIs, specificity, error pattern and response latency. DeepSeek-Chat reached the highest exact agreement with surgeons (50.6%) and showed perfect specificity (100%) but low sensitivity (18%), missing 82% of actionable D/X alerts. ChatGPT-5 showed the highest sensitivity (98.0%) but lower specificity (56.7%) and generated more false-positive warnings. Median response time was 3.6 s for the fastest model versus 225 s for expert review. These findings indicate that current LLMs can deliver rapid, structured DDI screening in oral surgery but exhibit distinct safety trade-offs between missed critical interactions and alert overcalling. They should therefore be considered as decision-support tools rather than substitutes for clinical judgment and their integration should prioritize validated, supervised workflows.

Tayeb, S., Barausse, C., Pellegrino, G., Sansavini, M., Pistilli, R., Felice, P. (2025). Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy. APPLIED SCIENCES, 15(23), 1-16 [10.3390/app152312851].

Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy

Tayeb S.;Barausse C.;Pellegrino G.;Sansavini M.;Pistilli R.;Felice P.

2025

Abstract

Patients undergoing oral surgery are frequently polymedicated and preoperative prescriptions (analgesics, corticosteroids, antibiotics) can generate clinically significant drug–drug interactions (DDIs) associated with bleeding risk, serotonin toxicity, cardiovascular instability and other adverse events. This study prospectively evaluated whether large language models (LLMs) can assist in detecting clinically relevant DDIs at the point of care. Five LLMs (ChatGPT-5, DeepSeek-Chat, DeepSeek-Reasoner, Gemini-Flash, and Gemini-Pro) were compared with a panel of experienced oral surgeons in 500 standardized oral-surgery cases constructed from realistic chronic medication profiles and typical postoperative regimens. For each case, all chronic and procedure-related drugs were provided and the task was to identify DDIs and rate their severity using an ordinal Lexicomp-based scale (A–X), with D/X considered “action required”. Primary outcomes were exact agreement with surgeon consensus and ordinal concordance; secondary outcomes included sensitivity for actionable DDIs, specificity, error pattern and response latency. DeepSeek-Chat reached the highest exact agreement with surgeons (50.6%) and showed perfect specificity (100%) but low sensitivity (18%), missing 82% of actionable D/X alerts. ChatGPT-5 showed the highest sensitivity (98.0%) but lower specificity (56.7%) and generated more false-positive warnings. Median response time was 3.6 s for the fastest model versus 225 s for expert review. These findings indicate that current LLMs can deliver rapid, structured DDI screening in oral surgery but exhibit distinct safety trade-offs between missed critical interactions and alert overcalling. They should therefore be considered as decision-support tools rather than substitutes for clinical judgment and their integration should prioritize validated, supervised workflows.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Rivista
	
				APPLIED SCIENCES
			
	Codice DOI
	
				https://dx.doi.org/10.3390/app152312851
			
	Citazione
	
				Tayeb, S., Barausse, C., Pellegrino, G., Sansavini, M., Pistilli, R., Felice, P. (2025). Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy. APPLIED SCIENCES, 15(23), 1-16 [10.3390/app152312851].
			
	Tutti gli autori
	
						Tayeb, S.; Barausse, C.; Pellegrino, G.; Sansavini, M.; Pistilli, R.; Felice, P.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2025 - AI drug interactions - Tayeb_AS.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.35 MB Formato Adobe PDF Visualizza/Apri	1.35 MB	Adobe PDF	Visualizza/Apri
applsci-15-12851-s001.zip accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 219.17 kB Formato Zip File Visualizza/Apri	219.17 kB	Zip File	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1048590

Citazioni

ND

3

1

ND

social impact