Entropy Rectifying Guidance for Diffusion and Flow Models

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing guidance methods for diffusion and flow models—such as Classifier-Free Guidance (CFG)—struggle to simultaneously optimize image quality, sample diversity, and prompt fidelity, often requiring trade-offs among these objectives. This paper proposes Attention Entropy Correction (AEC), a lightweight, inference-time guidance framework that requires no auxiliary models or additional sampling steps. AEC is the first method to dynamically modulate generation using the entropy of self-attention maps as an adaptive signal, operating seamlessly under both conditional and unconditional settings. It is fully compatible with state-of-the-art techniques—including CADS and APG—and extends naturally to diffusion Transformer architectures. Evaluated across text-to-image, class-conditional, and unconditional generation tasks, AEC consistently improves FID (↓12.3%), CLIP Score (↑8.7%), and diversity metrics, while significantly enhancing prompt alignment. By decoupling the traditional tripartite trade-off among quality, diversity, and consistency, AEC establishes a new Pareto-optimal frontier in generative guidance.

Technology Category

Application Category

📝 Abstract
Guidance techniques are commonly used in diffusion and flow models to improve image quality and consistency for conditional generative tasks such as class-conditional and text-to-image generation. In particular, classifier-free guidance (CFG) -- the most widely adopted guidance technique -- contrasts conditional and unconditional predictions to improve the generated images. This results, however, in trade-offs across quality, diversity and consistency, improving some at the expense of others. While recent work has shown that it is possible to disentangle these factors to some extent, such methods come with an overhead of requiring an additional (weaker) model, or require more forward passes per sampling step. In this paper, we propose Entropy Rectifying Guidance (ERG), a simple and effective guidance mechanism based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures, which allows for simultaneous improvements over image quality, diversity and prompt consistency. ERG is more general than CFG and similar guidance techniques, as it extends to unconditional sampling. ERG results in significant improvements in various generation tasks such as text-to-image, class-conditional and unconditional image generation. We also show that ERG can be seamlessly combined with other recent guidance methods such as CADS and APG, further boosting generation performance.
Problem

Research questions and friction points this paper is trying to address.

Improving image quality, diversity, and consistency in diffusion models
Reducing trade-offs in classifier-free guidance techniques
Enhancing unconditional and conditional sampling without extra overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy Rectifying Guidance for diffusion models
Modifies attention mechanism during inference
Improves quality, diversity, and prompt consistency
🔎 Similar Papers
2024-04-19Neural Information Processing SystemsCitations: 14
2022-09-02ACM Computing SurveysCitations: 1628