🤖 AI Summary
Existing dermatological AI models suffer from dataset biases—lacking representation of real-world outpatient settings, skin tone diversity, and non-Western populations. Method: We introduce the first outpatient-oriented, multi-concept skin image dataset for India (5,450+ images, 240+ diagnoses), annotated using an etiology-driven hierarchical labeling scheme grounded in the Rook classification system. This scheme systematically integrates anatomical location, clinical concepts, and diagnostic granularity to ensure clinical fidelity and localization relevance. We benchmark multiple architectures—including ResNet, DenseNet, EfficientNet, ViT, MaxViT, and Swin—and incorporate Concept Bottleneck Models (CBMs) for concept-guided learning. Contribution/Results: Concept-level supervision significantly enhances model interpretability and cross-population generalization. Our dataset establishes a new foundation for trustworthy, reproducible, and scalable clinical AI, while the integrated annotation framework and CBM evaluation provide a methodological paradigm for domain-adapted dermatological modeling.
📝 Abstract
Artificial intelligence is poised to augment dermatological care by enabling scalable image-based diagnostics. Yet, the development of robust and equitable models remains hindered by datasets that fail to capture the clinical and demographic complexity of real-world practice. This complexity stems from region-specific disease distributions, wide variation in skin tones, and the underrepresentation of outpatient scenarios from non-Western populations. We introduce DermaCon-IN, a prospectively curated dermatology dataset comprising over 5,450 clinical images from approximately 3,000 patients across outpatient clinics in South India. Each image is annotated by board-certified dermatologists with over 240 distinct diagnoses, structured under a hierarchical, etiology-based taxonomy adapted from Rook's classification. The dataset captures a wide spectrum of dermatologic conditions and tonal variation commonly seen in Indian outpatient care. We benchmark a range of architectures including convolutional models (ResNet, DenseNet, EfficientNet), transformer-based models (ViT, MaxViT, Swin), and Concept Bottleneck Models to establish baseline performance and explore how anatomical and concept-level cues may be integrated. These results are intended to guide future efforts toward interpretable and clinically realistic models. DermaCon-IN provides a scalable and representative foundation for advancing dermatology AI in real-world settings.