OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to Open AI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (i) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed three times, (ii) reliability between two independent reviewers in grading ChatGPT answers, and (iii) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by inter-observer agreement as the frequency of the agreements among all judgements. RESULTS: We tested nine clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all three trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intra (reviewer 1: κ=0.90 standard error (SE) =0.09; reviewer 2: κ=0.90 se=0.10) and inter-reliability (κ=0.85 SE=0.15) between the two reviewers was "almost perfect". Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain.

Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for Lumbosacral Radicular Pain: A cross-sectional study / Gianola, Silvia; Bargeri, Silvia; Castellini, Greta; Cook, Chad; Palese, Alvisa; Pillastrini, Paolo; Salvalaggio, Silvia; Turolla, Andrea; Rossettini, Giacomo. - In: JOURNAL OF ORTHOPAEDIC & SPORTS PHYSICAL THERAPY. - ISSN 0190-6011. - STAMPA. - 1:(2024), pp. 1-18. [10.2519/jospt.2024.12151]

Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for Lumbosacral Radicular Pain: A cross-sectional study

Pillastrini, Paolo;Turolla, Andrea
Ultimo
;
2024

Abstract

OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to Open AI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (i) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed three times, (ii) reliability between two independent reviewers in grading ChatGPT answers, and (iii) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by inter-observer agreement as the frequency of the agreements among all judgements. RESULTS: We tested nine clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all three trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intra (reviewer 1: κ=0.90 standard error (SE) =0.09; reviewer 2: κ=0.90 se=0.10) and inter-reliability (κ=0.85 SE=0.15) between the two reviewers was "almost perfect". Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain.
2024
Performance of ChatGPT compared to clinical practice guidelines in making informed decisions for Lumbosacral Radicular Pain: A cross-sectional study / Gianola, Silvia; Bargeri, Silvia; Castellini, Greta; Cook, Chad; Palese, Alvisa; Pillastrini, Paolo; Salvalaggio, Silvia; Turolla, Andrea; Rossettini, Giacomo. - In: JOURNAL OF ORTHOPAEDIC & SPORTS PHYSICAL THERAPY. - ISSN 0190-6011. - STAMPA. - 1:(2024), pp. 1-18. [10.2519/jospt.2024.12151]
Gianola, Silvia; Bargeri, Silvia; Castellini, Greta; Cook, Chad; Palese, Alvisa; Pillastrini, Paolo; Salvalaggio, Silvia; Turolla, Andrea; Rossettini, Giacomo
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/954564
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact