🤖 AI Summary
Existing pre-trained text-to-image diffusion models exhibit weak generation capability in low-density semantic regions—i.e., rare “minority classes” described by sparse or infrequent textual prompts—limiting their utility in data augmentation and creative generation. To address this, we propose a fine-tuning-free online prompt optimization framework, introducing the first prompt optimization paradigm specifically designed for minority-class generation. Our method employs a likelihood-driven objective function to overcome the inherent bias of standard samplers (e.g., Classifier-Free Guidance) toward high-density semantic regions. It integrates online gradient-based optimization, conditional likelihood modeling, latent-space perturbation constraints, and a plug-and-play multi-model prompt adapter. Evaluated on mainstream models including Stable Diffusion, our approach achieves an 18.7% reduction in FID and a 23.4% improvement in CLIP-Score, significantly enhancing both fidelity and diversity of minority-class image generation. The code is open-sourced and has been widely reproduced.
📝 Abstract
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models. Minority instances, in the context of T2I generation, can be defined as ones living on low-density regions of text-conditional data distributions. They are valuable for various applications of modern T2I generators, such as data augmentation and creative AI. Unfortunately, existing pretrained T2I diffusion models primarily focus on high-density regions, largely due to the influence of guided samplers (like CFG) that are essential for high-quality generation. To address this, we present a novel framework to counter the high-density-focus of T2I diffusion models. Specifically, we first develop an online prompt optimization framework that encourages emergence of desired properties during inference while preserving semantic contents of user-provided prompts. We subsequently tailor this generic prompt optimizer into a specialized solver that promotes generation of minority features by incorporating a carefully-crafted likelihood objective. Extensive experiments conducted across various types of T2I models demonstrate that our approach significantly enhances the capability to produce high-quality minority instances compared to existing samplers. Code is available at https://github.com/soobin-um/MinorityPrompt.