🤖 AI Summary
This work addresses the challenge of model bias in skin lesion classification caused by the scarcity of malignant samples. To mitigate class imbalance, the authors propose a novel framework that integrates class-conditional diffusion-based image generation, masked autoencoder (MAE) self-supervised pretraining, and knowledge distillation. High-quality synthetic images of malignant lesions are first generated using a conditional diffusion model. A Vision Transformer (ViT) is then pretrained via MAE on both real and synthetic data to learn robust representations. Finally, knowledge distillation transfers the learned capabilities from the large ViT to a lightweight student network, enabling high classification performance with efficient mobile deployment. This study presents the first systematic integration of generative modeling, self-supervised learning, and model compression, offering an effective solution for few-shot medical image classification.
📝 Abstract
Skin lesion classification datasets often suffer from severe class imbalance, with malignant cases significantly underrepresented, leading to biased decision boundaries during deep learning training. We address this challenge using class-conditioned diffusion models to generate synthetic dermatological images, followed by self-supervised MAE pretraining to enable huge ViT models to learn robust, domain-relevant features. To support deployment in practical clinical settings, where lightweight models are required, we apply knowledge distillation to transfer these representations to a smaller ViT student suitable for mobile devices. Our results show that MAE pretraining on synthetic data, combined with distillation, improves classification performance while enabling efficient on-device inference for practical clinical use.