CRIS Current Research Information System

Background/Objectives: Trichoscopy is an important diagnostic tool for hair and scalp disorders, but it requires significant expertise. Publicly available large language models (LLMs) are becoming more popular among both physicians and patients, yet their usefulness in trichology is unknown. We aimed to evaluate the diagnostic accuracy of four publicly available LLMs when interpreting trichoscopic images, as well as to compare their performance with that of dermatology residents, board-certified dermatologists, and trichology experts. Method: In this prospective comparative study, a preprocessed set of trichoscopic images was assessed in an online image-based survey. To reduced recognition bias from public image repositories, all images were structurally transformed while preserving diagnostic features. Fifteen dermatologists (five residents, four board-certified dermatologists, six trichology experts) provided a suspected diagnosis (SD), and up to three the differential diagnoses (DD). Four LLMs (ChatGPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, and Grok-3) evaluated the images under the same conditions. Results: The overall diagnostic accuracy among 15 dermatologists was 58.1% (95% CI, 53.0-63.0) for SD and 68.3% (95% CI, 63.4-72.8) for SD + DD. Experts significantly outperformed residents and board-certified dermatologists. AI models achieved an accuracy of 18.2% (95% CI, 11.8-26.9) for SD and 44.4% (95% CI, 35.0-54.3) for SD + DD. Gemini 2.5 Flash performed best, with an accuracy of 62.5% for SD + DD. Agreement among dermatologists increased with experience (AC1 up to 0.65 for experts), while agreement among AI models was moderate to good (AC1 up to 0.70). Agreement between AI models and dermatologists was only slight to fair (AC1 = 0.06 for SD and 0.21 for SD + DD). All human-AI differences were statistically significant (p < 0.001). Conclusions: In trichology, publicly available LLMs currently underperform compared to human experts, especially in providing a single correct diagnosis. These models require further development and specialized training before they can reliably assist with trichological diagnoses in routine care.

Signer, B., Mokhtari, A., Cazzaniga, S., Brand, F., Caro, G., De Viragh, P.a., et al. (2026). Publicly Available Large Language Models for Trichoscopy: A Head-to-Head Comparison with Dermatologists. DIAGNOSTICS, 16(1), 1-9 [10.3390/diagnostics16010169].

Publicly Available Large Language Models for Trichoscopy: A Head-to-Head Comparison with Dermatologists

Signer B;Mokhtari A;Cazzaniga S;Brand F;Caro G;de Viragh PA;Heidemeyer K;Hosseini A;Iorizzo M;Junge A;Martignoni Z;Reygagne PE;Piraccini BM;Starace M;Reimer-Taschenbrecker A;Vogel C;Obrist D;Seyed Jafari SM.

2026

Abstract

Background/Objectives: Trichoscopy is an important diagnostic tool for hair and scalp disorders, but it requires significant expertise. Publicly available large language models (LLMs) are becoming more popular among both physicians and patients, yet their usefulness in trichology is unknown. We aimed to evaluate the diagnostic accuracy of four publicly available LLMs when interpreting trichoscopic images, as well as to compare their performance with that of dermatology residents, board-certified dermatologists, and trichology experts. Method: In this prospective comparative study, a preprocessed set of trichoscopic images was assessed in an online image-based survey. To reduced recognition bias from public image repositories, all images were structurally transformed while preserving diagnostic features. Fifteen dermatologists (five residents, four board-certified dermatologists, six trichology experts) provided a suspected diagnosis (SD), and up to three the differential diagnoses (DD). Four LLMs (ChatGPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, and Grok-3) evaluated the images under the same conditions. Results: The overall diagnostic accuracy among 15 dermatologists was 58.1% (95% CI, 53.0-63.0) for SD and 68.3% (95% CI, 63.4-72.8) for SD + DD. Experts significantly outperformed residents and board-certified dermatologists. AI models achieved an accuracy of 18.2% (95% CI, 11.8-26.9) for SD and 44.4% (95% CI, 35.0-54.3) for SD + DD. Gemini 2.5 Flash performed best, with an accuracy of 62.5% for SD + DD. Agreement among dermatologists increased with experience (AC1 up to 0.65 for experts), while agreement among AI models was moderate to good (AC1 up to 0.70). Agreement between AI models and dermatologists was only slight to fair (AC1 = 0.06 for SD and 0.21 for SD + DD). All human-AI differences were statistically significant (p < 0.001). Conclusions: In trichology, publicly available LLMs currently underperform compared to human experts, especially in providing a single correct diagnosis. These models require further development and specialized training before they can reliably assist with trichological diagnoses in routine care.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				DIAGNOSTICS
			
	Codice DOI
	
				https://dx.doi.org/10.3390/diagnostics16010169
			
	Citazione
	
				Signer, B., Mokhtari, A., Cazzaniga, S., Brand, F., Caro, G., De Viragh, P.a., et al. (2026). Publicly Available Large Language Models for Trichoscopy: A Head-to-Head Comparison with Dermatologists. DIAGNOSTICS, 16(1), 1-9 [10.3390/diagnostics16010169].
			
	Tutti gli autori
	
						Signer, B; Mokhtari, A; Cazzaniga, S; Brand, F; Caro, G; De Viragh, Pa; Heidemeyer, K; Hosseini, A; Iorizzo, M; Junge, A; Martignoni, Z; Reygagne, Pe;...espandi
						
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
diagnostics-16-00169-v2.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 413.2 kB Formato Adobe PDF Visualizza/Apri	413.2 kB	Adobe PDF	Visualizza/Apri
diagnostics-16-00169-s001.zip accesso aperto Tipo: File Supplementare Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 108.84 kB Formato Zip File Visualizza/Apri	108.84 kB	Zip File	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1042933

Citazioni

1

0

0

ND

social impact