🤖 AI Summary
Existing dermatological datasets lack concept-level metadata and clinically grounded natural language descriptions, hindering the interpretability of multimodal large models in skin diagnosis. To address this, we introduce SkinDx—the first dermatology-specific multimodal dataset integrating medical images, fine-grained natural language descriptions, and clinical chain-of-thought (CoT) reasoning—comprising 7,041 cases annotated by board-certified dermatologists and validated through multiple clinical review rounds. We propose a dual-component framework (SkinCAP + SkinCoT), pioneering a hierarchical, clinically verifiable CoT annotation paradigm, supported by a six-dimensional quality assessment and iterative refinement pipeline. Built upon Fitzpatrick-17k and Diverse Dermatology with targeted expansion, SkinDx is publicly released on Hugging Face. Experiments demonstrate substantial improvements in vision-language large models’ lesion description accuracy, diagnostic logical interpretability, and clinical credibility—establishing a new benchmark for healthcare multimodal AI.
📝 Abstract
With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency impedes the advancement of LLM-based methods in dermatologic diagnosis. To address this gap and provide a meticulously annotated dermatology dataset with comprehensive natural language descriptions, we introduce extbf{SkinCaRe}, a comprehensive multimodal resource that unifies extit{SkinCAP} and extit{SkinCoT}. extbf{SkinCAP} comprises 4,000 images sourced from the Fitzpatrick 17k skin disease dataset and the Diverse Dermatology Images dataset, annotated by board-certified dermatologists to provide extensive medical descriptions and captions. In addition, we introduce extbf{SkinCoT}, a curated dataset pairing 3,041 dermatologic images with clinician-verified, hierarchical chain-of-thought (CoT) diagnoses. Each diagnostic narrative is rigorously evaluated against six quality criteria and iteratively refined until it meets a predefined standard of clinical accuracy and explanatory depth. Together, SkinCAP (captioning) and SkinCoT (reasoning), collectively referred to as SkinCaRe, encompass 7,041 expertly curated dermatologic cases and provide a unified and trustworthy resource for training multimodal models that both describe and explain dermatologic images. SkinCaRe is publicly available at https://huggingface.co/datasets/yuhos16/SkinCaRe.