🤖 AI Summary
Diffusion models suffer from significantly degraded fidelity when generating rare concepts—those associated with low-frequency prompts during training. To address this, we propose RAP, the first framework that formulates rare-concept synthesis as causal path navigation in latent space—from common to target concepts. Our key contributions are: (1) a dynamic prompt-switching mechanism guided by score similarity, enabling semantic-aware prompt scheduling; (2) reinterpretation of prompt alternation as a second-order denoising process, ensuring semantic coherence across prompt transitions; and (3) a lightweight, backbone-agnostic adapter design compatible with diverse diffusion architectures. Extensive experiments on mainstream models—including SDXL and SD1.5—demonstrate that RAP consistently outperforms prior methods in both automated metrics (CLIP-Score, DINOv2 similarity) and human evaluation, especially for rare objects, attributes, and compositional concepts.
📝 Abstract
Diffusion models have shown strong capabilities in high-fidelity image generation but often falter when synthesizing rare concepts, i.e., prompts that are infrequently observed in the training distribution. In this paper, we introduce RAP, a principled framework that treats rare concept generation as navigating a latent causal path: a progressive, model-aligned trajectory through the generative space from frequent concepts to rare targets. Rather than relying on heuristic prompt alternation, we theoretically justify that rare prompt guidance can be approximated by semantically related frequent prompts. We then formulate prompt switching as a dynamic process based on score similarity, enabling adaptive stage transitions. Furthermore, we reinterpret prompt alternation as a second-order denoising mechanism, promoting smooth semantic progression and coherent visual synthesis. Through this causal lens, we align input scheduling with the model's internal generative dynamics. Experiments across diverse diffusion backbones demonstrate that RAP consistently enhances rare concept generation, outperforming strong baselines in both automated evaluations and human studies.