🤖 AI Summary
This work addresses a key limitation in existing Classifier-Free Guidance (CFG) methods, which rely on fixed or heuristic dynamic guidance weights without a rigorous theoretical characterization of the time-varying discrepancy between conditional and unconditional score estimates during diffusion. The authors present the first formal analysis of how this score difference evolves across timesteps, deriving a tight upper bound that informs a novel exponential decay guidance strategy aligned with the underlying diffusion dynamics. The proposed method is training-free, plug-and-play, and exhibits both generality and orthogonality, enabling seamless integration into existing CFG frameworks. Extensive experiments across diverse generative tasks demonstrate consistent and significant improvements in sample quality, confirming the effectiveness and broad applicability of the approach.
📝 Abstract
Classifier-Free Guidance (CFG) is a cornerstone of modern conditional diffusion models, yet its reliance on the fixed or heuristic dynamic guidance weight is predominantly empirical and overlooks the inherent dynamics of the diffusion process. In this paper, we provide a rigorous theoretical analysis of the Classifier-Free Guidance. Specifically, we establish strict upper bounds on the score discrepancy between conditional and unconditional distributions at different timesteps based on the diffusion process. This finding explains the limitations of fixed-weight strategies and establishes a principled foundation for time-dependent guidance. Motivated by this insight, we introduce \textbf{Control Classifier-Free Guidance (C$^2$FG)}, a novel, training-free, and plug-in method that aligns the guidance strength with the diffusion dynamics via an exponential decay control function. Extensive experiments demonstrate that C$^2$FG is effective and broadly applicable across diverse generative tasks, while also exhibiting orthogonality to existing strategies.