π€ AI Summary
This study addresses the limited generalization of existing skin lesion classification models, which stems from small-scale, low-diversity clinical datasets and class imbalance. To overcome these challenges, the authors propose a novel text-to-image generation framework based on Rectified Flow that integrates dermatological semantic descriptions with efficient image synthesis for the first time. Specifically, they employ Llama 3.2 to generate clinically compliant textual descriptions of lesions and apply parameter-efficient LoRA fine-tuning to Flux.1 to produce high-quality synthetic imageβtext pairs for data augmentation. A Vision Transformer (ViT) model trained on only 2,500 real and 4,375 synthetic images achieves 78.04% accuracy and an AUC of 0.859, surpassing current state-of-the-art dermatological models by 8%, with data augmentation contributing up to a 9% performance gain.
π Abstract
Despite recent advances in deep generative modeling, skin lesion classification systems remain constrained by the limited availability of large, diverse, and well-annotated clinical datasets, resulting in class imbalance between benign and malignant lesions and consequently reduced generalization performance. We introduce DermaFlux, a rectified flow-based text-to-image generative framework that synthesizes clinically grounded skin lesion images from natural language descriptions of dermatological attributes. Built upon Flux.1, DermaFlux is fine-tuned using parameter-efficient Low-Rank Adaptation (LoRA) on a large curated collection of publicly available clinical image datasets. We construct image-text pairs using synthetic textual captions generated by Llama 3.2, following established dermatological criteria including lesion asymmetry, border irregularity, and color variation. Extensive experiments demonstrate that DermaFlux generates diverse and clinically meaningful dermatology images that improve binary classification performance by up to 6% when augmenting small real-world datasets, and by up to 9% when classifiers are trained on DermaFlux-generated synthetic images rather than diffusion-based synthetic images. Our ImageNet-pretrained ViT fine-tuned with only 2,500 real images and 4,375 DermaFlux-generated samples achieves 78.04% binary classification accuracy and an AUC of 0.859, surpassing the next best dermatology model by 8%.