LesionGen: A Concept-Guided Diffusion Model for Dermatology Image Synthesis

📅 2025-07-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world labeled dermatological data are scarce due to privacy constraints, high annotation costs, and insufficient demographic representation. Method: This paper proposes LesionGen—a novel framework that guides text-to-image diffusion models using structured, concept-rich clinical lesion descriptions (beyond coarse disease labels), fine-tuned with expert-annotated clinical concepts and pseudo-reports. Contribution/Results: LesionGen enables lesion-centric synthesis of high-fidelity, diverse skin lesion images. Experiments demonstrate that classifiers trained solely on synthetic data achieve accuracy on real test sets comparable to those trained on full real-data benchmarks; notably, performance improves significantly on the worst-case subgroups. These results validate LesionGen’s clinical relevance and strong generalization capability, addressing critical data scarcity challenges in dermatological AI.

Technology Category

Application Category

📝 Abstract
Deep learning models for skin disease classification require large, diverse, and well-annotated datasets. However, such resources are often limited due to privacy concerns, high annotation costs, and insufficient demographic representation. While text-to-image diffusion probabilistic models (T2I-DPMs) offer promise for medical data synthesis, their use in dermatology remains underexplored, largely due to the scarcity of rich textual descriptions in existing skin image datasets. In this work, we introduce LesionGen, a clinically informed T2I-DPM framework for dermatology image synthesis. Unlike prior methods that rely on simplistic disease labels, LesionGen is trained on structured, concept-rich dermatological captions derived from expert annotations and pseudo-generated, concept-guided reports. By fine-tuning a pretrained diffusion model on these high-quality image-caption pairs, we enable the generation of realistic and diverse skin lesion images conditioned on meaningful dermatological descriptions. Our results demonstrate that models trained solely on our synthetic dataset achieve classification accuracy comparable to those trained on real images, with notable gains in worst-case subgroup performance. Code and data are available here.
Problem

Research questions and friction points this paper is trying to address.

Lack of large diverse annotated dermatology datasets
Limited use of diffusion models in dermatology
Need for realistic synthetic skin lesion images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-guided diffusion model for dermatology
Structured expert annotations for training
Synthetic data matches real image accuracy
🔎 Similar Papers
No similar papers found.