🤖 AI Summary
To address the insufficient generation diversity for tail classes and compromised fidelity of head classes in class-conditional diffusion models under long-tailed data distributions, this paper proposes a contrastive learning-based tail-class enhancement method. We innovatively introduce a conditional–unconditional generation alignment mechanism into diffusion models and design two lightweight contrastive losses: an unsupervised InfoNCE loss and a conditional–unconditional MSE contrastive loss. These are augmented with negative-sample enhancement and alignment during high-step denoising stages to facilitate effective knowledge transfer. Extensive experiments on CIFAR10-LT, CIFAR100-LT, PlacesLT, TinyImageNetLT, and ImageNetLT demonstrate that our method significantly improves tail-class generation diversity while preserving both fidelity and diversity for head-class images. It consistently outperforms standard DDPM and existing long-tailed generative approaches across all benchmarks.
📝 Abstract
Training data for class-conditional image synthesis often exhibit a long-tailed distribution with limited images for tail classes. Such an imbalance causes mode collapse and reduces the diversity of synthesized images for tail classes. For class-conditional diffusion models trained on imbalanced data, we aim to improve the diversity of tail class images without compromising the fidelity and diversity of head class images. We achieve this by introducing two deceptively simple but highly effective contrastive loss functions. Firstly, we employ an unsupervised InfoNCE loss utilizing negative samples to increase the distance/dissimilarity among synthetic images, particularly for tail classes. To further enhance the diversity of tail classes, our second loss is an MSE loss that contrasts class-conditional generation with unconditional generation at large timesteps. This second loss makes the denoising process insensitive to class conditions for the initial steps, which enriches tail classes through knowledge sharing from head classes. Conditional-unconditional alignment has been shown to enhance the performance of long-tailed GAN. We are the first to adapt such alignment to diffusion models. We successfully leveraged contrastive learning for class-imbalanced diffusion models. Our contrastive learning framework is easy to implement and outperforms standard DDPM and alternative methods for class-imbalanced diffusion models across various datasets, including CIFAR10/100-LT, PlacesLT, TinyImageNetLT, and ImageNetLT.