🤖 AI Summary
Standard classifier-free guidance (CFG) applies a uniform guidance scale across all frequency bands, causing oversaturation and reduced diversity at high frequencies, while yielding insufficient visual fidelity at low frequencies. This work uncovers the frequency-domain mechanism of CFG in diffusion models: low-frequency components govern global structure and conditional alignment, whereas high-frequency components enhance fine-grained detail clarity; uniform scaling disrupts their intrinsic balance. To address this, we propose Frequency-Domain Decoupled Guidance (FDG), which leverages Fourier analysis to decompose CFG into parallel low- and high-frequency guidance pathways, enabling independent control of their respective guidance strengths. FDG is plug-and-play, requiring no architectural modifications or retraining. Extensive experiments across multiple datasets and diffusion models demonstrate that FDG consistently improves both generation quality and diversity—achieving lower FID scores and higher recall rates than standard CFG—thereby effectively mitigating the quality-diversity trade-off.
📝 Abstract
Classifier-free guidance (CFG) has become an essential component of modern conditional diffusion models. Although highly effective in practice, the underlying mechanisms by which CFG enhances quality, detail, and prompt alignment are not fully understood. We present a novel perspective on CFG by analyzing its effects in the frequency domain, showing that low and high frequencies have distinct impacts on generation quality. Specifically, low-frequency guidance governs global structure and condition alignment, while high-frequency guidance mainly enhances visual fidelity. However, applying a uniform scale across all frequencies -- as is done in standard CFG -- leads to oversaturation and reduced diversity at high scales and degraded visual quality at low scales. Based on these insights, we propose frequency-decoupled guidance (FDG), an effective approach that decomposes CFG into low- and high-frequency components and applies separate guidance strengths to each component. FDG improves image quality at low guidance scales and avoids the drawbacks of high CFG scales by design. Through extensive experiments across multiple datasets and models, we demonstrate that FDG consistently enhances sample fidelity while preserving diversity, leading to improved FID and recall compared to CFG, establishing our method as a plug-and-play alternative to standard classifier-free guidance.