Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

📅 2024-10-03
🏛️ International Conference on Learning Representations
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
High guidance scales in diffusion models improve generation quality and conditional alignment but often induce oversaturation and unrealistic artifacts. To address this, we propose Adaptive Projection Guidance (APG), the first method to decouple Classifier-Free Guidance (CFG) updates in vector space into two orthogonal components: one parallel to the conditional prediction (harmful) and one orthogonal to it (beneficial), applying differential weighting—suppressing the former while preserving the latter. Theoretically, we establish the equivalence between CFG and gradient ascent. Methodologically, we introduce a momentum-based dynamic rescaling mechanism to stabilize optimization. APG is plug-and-play, incurs zero computational overhead, and consistently improves performance across diverse diffusion models and samplers: FID decreases by 3.2%, Recall increases by 4.7%, and saturation scores improve significantly—matching or exceeding standard CFG while mitigating its pathological artifacts.

Technology Category

Application Category

📝 Abstract
Classifier-free guidance (CFG) is crucial for improving both generation quality and alignment between the input condition and final output in diffusion models. While a high guidance scale is generally required to enhance these aspects, it also causes oversaturation and unrealistic artifacts. In this paper, we revisit the CFG update rule and introduce modifications to address this issue. We first decompose the update term in CFG into parallel and orthogonal components with respect to the conditional model prediction and observe that the parallel component primarily causes oversaturation, while the orthogonal component enhances image quality. Accordingly, we propose down-weighting the parallel component to achieve high-quality generations without oversaturation. Additionally, we draw a connection between CFG and gradient ascent and introduce a new rescaling and momentum method for the CFG update rule based on this insight. Our approach, termed adaptive projected guidance (APG), retains the quality-boosting advantages of CFG while enabling the use of higher guidance scales without oversaturation. APG is easy to implement and introduces practically no additional computational overhead to the sampling process. Through extensive experiments, we demonstrate that APG is compatible with various conditional diffusion models and samplers, leading to improved FID, recall, and saturation scores while maintaining precision comparable to CFG, making our method a superior plug-and-play alternative to standard classifier-free guidance.
Problem

Research questions and friction points this paper is trying to address.

Address oversaturation and artifacts in high guidance scales
Modify CFG update rule to enhance image quality
Propose adaptive projected guidance for better generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose CFG into parallel and orthogonal components
Down-weight parallel component to prevent oversaturation
Introduce rescaling and momentum method for CFG
🔎 Similar Papers
No similar papers found.