CRIS Current Research Information System

Background Histopathological interpretation is crucial for diagnosing inflammatory bowel disease (IBD), distinguishing between Crohn's Disease (CD), Ulcerative Colitis (UC), IBD-Unclassified (IBD-U), and Non-IBD colitis (NIBDC). However, interobserver variability and limited expertise can reduce diagnostic accuracy. Large Language Models (LLMs) such as GPT-5 may offer clinical support in interpreting histology reports.Methods We analyzed 100 real-life histological reports from ileo-colonoscopies, equally representing CD, UC, IBD-U, and NIBDC, collected across five Italian healthcare centers, including both IBD-specialized and non-specialized hospitals. A reference standard was established by an expert pathologist. Independent classifications were generated by GPT-5, five gastrointestinal pathologists, five IBD-expert gastroenterologists (GIs), and five non-expert GIs. Diagnostic performance (accuracy, recall, precision, F1-score), agreement with the reference standard (Cohen's kappa), and inter-rater reliability (Fleiss' kappa) were assessed.Results GPT-5 achieved the highest agreement with the reference standard with the highest accuracy (76.0%), compared to pathologists (68.6%), IBD-experts (69.2%), and non-experts (63.2%). Agreement with the reference standard was substantial for GPT-5 (kappa = 0.671) and moderate for human groups (kappa = 0.508-0.588). GPT-5 showed perfect recall for CD and UC, high recall for NIBDC (96.0%), but poor performance for IBD-U (recall 8.0%, F1-score 14.3%). Fleiss' kappa indicated moderate agreement among pathologists and IBD-experts, and fair agreement among non-experts.Conclusion GPT-5 demonstrated reliable performance in interpreting IBD histological reports, exhibiting high accuracy and strong agreement with the reference standard. While unreliable for IBD-U, GPT-5 may serve as a supportive tool in histopathological interpretation of IBD, particularly in centers with limited access to expert pathologists or IBD-specialists.

Maida, M., Vitello, A., Macaluso, F.S., Daperno, M., Mocci, G., Rispo, A., et al. (2026). Performance of GPT-5 in the Interpretation of IBD Histopathology Reports. UNITED EUROPEAN GASTROENTEROLOGY JOURNAL, 14(1), 1-7 [10.1002/ueg2.70161].

Performance of GPT-5 in the Interpretation of IBD Histopathology Reports

Maida M.;Vitello A.;Macaluso F. S.;Daperno M.;Mocci G.;Rispo A.;Calabrese G.;Decarli N. L.;Laschi L.;Fattorini C.;Locci G.;Sordo R. D.;Ligresti D.;Tacelli M.;Furnari M.;Sferrazza S.;Marasco G.;Facciorusso A.;Orlando A.;Villanacci V.

2026

Abstract

Background Histopathological interpretation is crucial for diagnosing inflammatory bowel disease (IBD), distinguishing between Crohn's Disease (CD), Ulcerative Colitis (UC), IBD-Unclassified (IBD-U), and Non-IBD colitis (NIBDC). However, interobserver variability and limited expertise can reduce diagnostic accuracy. Large Language Models (LLMs) such as GPT-5 may offer clinical support in interpreting histology reports.Methods We analyzed 100 real-life histological reports from ileo-colonoscopies, equally representing CD, UC, IBD-U, and NIBDC, collected across five Italian healthcare centers, including both IBD-specialized and non-specialized hospitals. A reference standard was established by an expert pathologist. Independent classifications were generated by GPT-5, five gastrointestinal pathologists, five IBD-expert gastroenterologists (GIs), and five non-expert GIs. Diagnostic performance (accuracy, recall, precision, F1-score), agreement with the reference standard (Cohen's kappa), and inter-rater reliability (Fleiss' kappa) were assessed.Results GPT-5 achieved the highest agreement with the reference standard with the highest accuracy (76.0%), compared to pathologists (68.6%), IBD-experts (69.2%), and non-experts (63.2%). Agreement with the reference standard was substantial for GPT-5 (kappa = 0.671) and moderate for human groups (kappa = 0.508-0.588). GPT-5 showed perfect recall for CD and UC, high recall for NIBDC (96.0%), but poor performance for IBD-U (recall 8.0%, F1-score 14.3%). Fleiss' kappa indicated moderate agreement among pathologists and IBD-experts, and fair agreement among non-experts.Conclusion GPT-5 demonstrated reliable performance in interpreting IBD histological reports, exhibiting high accuracy and strong agreement with the reference standard. While unreliable for IBD-U, GPT-5 may serve as a supportive tool in histopathological interpretation of IBD, particularly in centers with limited access to expert pathologists or IBD-specialists.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				UNITED EUROPEAN GASTROENTEROLOGY JOURNAL
			
	Codice DOI
	
				https://dx.doi.org/10.1002/ueg2.70161
			
	Citazione
	
				Maida, M., Vitello, A., Macaluso, F.S., Daperno, M., Mocci, G., Rispo, A., et al. (2026). Performance of GPT-5 in the Interpretation of IBD Histopathology Reports. UNITED EUROPEAN GASTROENTEROLOGY JOURNAL, 14(1), 1-7 [10.1002/ueg2.70161].
			
	Tutti gli autori
	
						Maida, M.; Vitello, A.; Macaluso, F. S.; Daperno, M.; Mocci, G.; Rispo, A.; Calabrese, G.; Decarli, N. L.; Laschi, L.; Fattorini, C.; Locci, G.; Sordo...espandi
						
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1M.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 881.38 kB Formato Adobe PDF Visualizza/Apri	881.38 kB	Adobe PDF	Visualizza/Apri
ueg270161-sup-0001-suppl-data.docx accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione - Non commerciale - Non opere derivate (CCBYNCND) Dimensione 295.04 kB Formato Microsoft Word XML Visualizza/Apri	295.04 kB	Microsoft Word XML	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1039370

Citazioni

1

0

ND

ND

social impact