🤖 AI Summary
This study addresses the opacity and poor verifiability of diagnostic reasoning in dermatology. To this end, we propose an interpretable vision-language reasoning framework grounded in explicit Chain-of-Thought (CoT) prompting. Methodologically, we construct DermCoT—the first dermatology-specific CoT corpus—and design DermEval, a clinically aligned six-dimensional evaluation framework, alongside the benchmark DermBench. We further introduce an Adapter-only dual-distillation architecture that jointly incorporates visual distillation, multi-task supervision, and expert-annotated data for end-to-end training. Experimental results demonstrate that our model achieves 4.031/5 on DermBench—outperforming Vision-R1 by 41%—and consistently surpasses baselines across three major dermatological classification benchmarks. It significantly improves both diagnostic accuracy and the quality of natural-language reasoning explanations, achieving breakthroughs in medical plausibility and reasoning verifiability.
📝 Abstract
We present SkinGPT-R1, a dermatology focused vision language model that makes diagnostic chain of thought reasoning explicit, step by step, and verifiable. To support skin specific reasoning, we build DermCoT, a corpus of standardized dermatologic chain of thought narratives that combines 10,000 DermEval filtered training cases with 3,000 dermatologist scored certified cases, and we define DermEval as a physician aligned six dimensional evaluator and DermBench as the corresponding benchmark for dermatologic chain of thought quality. On DermBench, across 14 general, reasoning, and medical vision language models, SkinGPT-R1 achieves an average score of 4.031 out of 5 over the six clinician defined dimensions, ranks 1st among all systems, and improves the average score over Vision-R1 by about 41%. On three dermatology classification benchmarks, SkinGPT-R1 delivers stable accuracy gains over Vision-R1 and remains competitive among strong vision language models. Ablation results further show that DermCoT based chain of thought supervision provides substantial improvements over the base model and that adding dermatology aware visual distillation yields consistent additional gains in both narrative quality and recognition.