Navigating with Annealing Guidance Scale in Diffusion Space

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

To address the trade-off between image quality and text alignment in text-to-image diffusion models caused by fixed Classifier-Free Guidance (CFG) scale, this work proposes an annealing-based dynamic guidance mechanism that introduces no additional parameters or memory overhead. Our method learns an adaptive scheduling strategy from conditional noise signals, enabling real-time adjustment of the CFG scale during denoising to jointly optimize generation stability and semantic fidelity. Experiments on multiple benchmarks demonstrate significant improvements in quantitative metrics—including FID and CLIP Score—while preserving inference efficiency. Moreover, the proposed approach substantially mitigates CFG’s sensitivity to hyperparameter tuning, yielding a more robust and efficient sampling guidance paradigm for controllable image generation.

Technology Category

Application Category

📝 Abstract

Denoising diffusion models excel at generating high-quality images conditioned on text prompts, yet their effectiveness heavily relies on careful guidance during the sampling process. Classifier-Free Guidance (CFG) provides a widely used mechanism for steering generation by setting the guidance scale, which balances image quality and prompt alignment. However, the choice of the guidance scale has a critical impact on the convergence toward a visually appealing and prompt-adherent image. In this work, we propose an annealing guidance scheduler which dynamically adjusts the guidance scale over time based on the conditional noisy signal. By learning a scheduling policy, our method addresses the temperamental behavior of CFG. Empirical results demonstrate that our guidance scheduler significantly enhances image quality and alignment with the text prompt, advancing the performance of text-to-image generation. Notably, our novel scheduler requires no additional activations or memory consumption, and can seamlessly replace the common classifier-free guidance, offering an improved trade-off between prompt alignment and quality.

Problem

Research questions and friction points this paper is trying to address.

Dynamic adjustment of guidance scale in diffusion models

Improving image quality and prompt alignment balance

Eliminating need for extra memory in guidance scheduling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic guidance scale adjustment scheduler

No additional memory or activation needed

Improves text-to-image quality and alignment

🔎 Similar Papers

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing