Patients undergoing oral surgery are frequently polymedicated and preoperative prescriptions (analgesics, corticosteroids, antibiotics) can generate clinically significant drug–drug interactions (DDIs) associated with bleeding risk, serotonin toxicity, cardiovascular instability and other adverse events. This study prospectively evaluated whether large language models (LLMs) can assist in detecting clinically relevant DDIs at the point of care. Five LLMs (ChatGPT-5, DeepSeek-Chat, DeepSeek-Reasoner, Gemini-Flash, and Gemini-Pro) were compared with a panel of experienced oral surgeons in 500 standardized oral-surgery cases constructed from realistic chronic medication profiles and typical postoperative regimens. For each case, all chronic and procedure-related drugs were provided and the task was to identify DDIs and rate their severity using an ordinal Lexicomp-based scale (A–X), with D/X considered “action required”. Primary outcomes were exact agreement with surgeon consensus and ordinal concordance; secondary outcomes included sensitivity for actionable DDIs, specificity, error pattern and response latency. DeepSeek-Chat reached the highest exact agreement with surgeons (50.6%) and showed perfect specificity (100%) but low sensitivity (18%), missing 82% of actionable D/X alerts. ChatGPT-5 showed the highest sensitivity (98.0%) but lower specificity (56.7%) and generated more false-positive warnings. Median response time was 3.6 s for the fastest model versus 225 s for expert review. These findings indicate that current LLMs can deliver rapid, structured DDI screening in oral surgery but exhibit distinct safety trade-offs between missed critical interactions and alert overcalling. They should therefore be considered as decision-support tools rather than substitutes for clinical judgment and their integration should prioritize validated, supervised workflows.

Tayeb, S., Barausse, C., Pellegrino, G., Sansavini, M., Pistilli, R., Felice, P. (2025). Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy. APPLIED SCIENCES, 15(23), 1-16 [10.3390/app152312851].

Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy

Tayeb S.
;
Barausse C.;Pellegrino G.;Felice P.
2025

Abstract

Patients undergoing oral surgery are frequently polymedicated and preoperative prescriptions (analgesics, corticosteroids, antibiotics) can generate clinically significant drug–drug interactions (DDIs) associated with bleeding risk, serotonin toxicity, cardiovascular instability and other adverse events. This study prospectively evaluated whether large language models (LLMs) can assist in detecting clinically relevant DDIs at the point of care. Five LLMs (ChatGPT-5, DeepSeek-Chat, DeepSeek-Reasoner, Gemini-Flash, and Gemini-Pro) were compared with a panel of experienced oral surgeons in 500 standardized oral-surgery cases constructed from realistic chronic medication profiles and typical postoperative regimens. For each case, all chronic and procedure-related drugs were provided and the task was to identify DDIs and rate their severity using an ordinal Lexicomp-based scale (A–X), with D/X considered “action required”. Primary outcomes were exact agreement with surgeon consensus and ordinal concordance; secondary outcomes included sensitivity for actionable DDIs, specificity, error pattern and response latency. DeepSeek-Chat reached the highest exact agreement with surgeons (50.6%) and showed perfect specificity (100%) but low sensitivity (18%), missing 82% of actionable D/X alerts. ChatGPT-5 showed the highest sensitivity (98.0%) but lower specificity (56.7%) and generated more false-positive warnings. Median response time was 3.6 s for the fastest model versus 225 s for expert review. These findings indicate that current LLMs can deliver rapid, structured DDI screening in oral surgery but exhibit distinct safety trade-offs between missed critical interactions and alert overcalling. They should therefore be considered as decision-support tools rather than substitutes for clinical judgment and their integration should prioritize validated, supervised workflows.
2025
Tayeb, S., Barausse, C., Pellegrino, G., Sansavini, M., Pistilli, R., Felice, P. (2025). Comparing Artificial Intelligence (ChatGPT, Gemini, DeepSeek) and Oral Surgeons in Detecting Clinically Relevant Drug–Drug Interactions in Dental Therapy. APPLIED SCIENCES, 15(23), 1-16 [10.3390/app152312851].
Tayeb, S.; Barausse, C.; Pellegrino, G.; Sansavini, M.; Pistilli, R.; Felice, P.
File in questo prodotto:
File Dimensione Formato  
2025 - AI drug interactions - Tayeb_AS.pdf

accesso aperto

Tipo: Versione (PDF) editoriale / Version Of Record
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.35 MB
Formato Adobe PDF
1.35 MB Adobe PDF Visualizza/Apri
applsci-15-12851-s001.zip

accesso aperto

Tipo: File Supplementare
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 219.17 kB
Formato Zip File
219.17 kB Zip File Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1048590
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact