REG: Rectified Gradient Guidance for Conditional Diffusion Models

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing conditional diffusion models suffer from a theoretical-practical disconnect in guidance mechanisms, limiting generation performance. This work first reveals the theoretical inefficacy of conventional guidance objectives—such as Classifier-Free Guidance (CFG)—under unconstrained settings, where no lookahead constraints are imposed. To bridge this gap, we propose RECTIFIED GRADIENT GUIDANCE (REG), a novel gradient-correction-based guidance paradigm. REG jointly reweights the joint distribution and applies theoretically grounded gradient correction to approximate the unconstrained optimal solution, while maintaining plug-and-play compatibility with existing guidance methods. Evaluated on ImageNet class-conditional generation and multi-scale text-to-image synthesis, REG consistently improves FID (↓2.1–4.3), Inception Score (↑0.8–1.9), and CLIP Score (↑0.07–0.12). Our approach advances both the theoretical foundations and practical efficacy of diffusion guidance, unifying theory and practice.

Technology Category

Application Category

📝 Abstract

Guidance techniques are simple yet effective for improving conditional generation in diffusion models. Albeit their empirical success, the practical implementation of guidance diverges significantly from its theoretical motivation. In this paper, we reconcile this discrepancy by replacing the scaled marginal distribution target, which we prove theoretically invalid, with a valid scaled joint distribution objective. Additionally, we show that the established guidance implementations are approximations to the intractable optimal solution under no future foresight constraint. Building on these theoretical insights, we propose rectified gradient guidance (REG), a versatile enhancement designed to boost the performance of existing guidance methods. Experiments on 1D and 2D demonstrate that REG provides a better approximation to the optimal solution than prior guidance techniques, validating the proposed theoretical framework. Extensive experiments on class-conditional ImageNet and text-to-image generation tasks show that incorporating REG consistently improves FID and Inception/CLIP scores across various settings compared to its absence.

Problem

Research questions and friction points this paper is trying to address.

Guidance Techniques

Diffusion Models

Image Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified Gradient Guidance

Conditional Diffusion Models

Image Quality Enhancement

🔎 Similar Papers

Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion