Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Classifier-Free Guidance (CFG) improves generation quality and prompt alignment in conditional diffusion models but severely compromises sample diversity, leading to a fundamental quality-diversity trade-off. This work first identifies that CFG does not correspond to a rigorously defined denoising diffusion process, revealing the absence of a critical Rényi divergence correction term. Building on this theoretical insight, we propose the first theoretically consistent, diversity-preserving CFG-enhanced sampling framework: a Gibbs-like iterative reweighting sampler. Unlike standard CFG, our method explicitly incorporates the missing divergence correction while maintaining computational efficiency. Extensive experiments across image and text-to-audio generation demonstrate consistent superiority over baseline CFG—achieving significant improvements in FID, CLIP Score, diversity metrics (e.g., LPIPS diversity), and human evaluation scores—thereby reconciling high-fidelity generation with rich sample diversity.

Technology Category

Application Category

📝 Abstract
Classifier-Free Guidance (CFG) is a widely used technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers. While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity, leading to a challenging trade-off between quality and diversity. To address this issue, we make two key contributions. First, CFG generally does not correspond to a well-defined denoising diffusion model (DDM). In particular, contrary to common intuition, CFG does not yield samples from the target distribution associated with the limiting CFG score as the noise level approaches zero -- where the data distribution is tilted by a power $w gt 1$ of the conditional distribution. We identify the missing component: a R'enyi divergence term that acts as a repulsive force and is required to correct CFG and render it consistent with a proper DDM. Our analysis shows that this correction term vanishes in the low-noise limit. Second, motivated by this insight, we propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution. This method starts with an initial sample from the conditional diffusion model without CFG and iteratively refines it, preserving diversity while progressively enhancing sample quality. We evaluate our approach on both image and text-to-audio generation tasks, demonstrating substantial improvements over CFG across all considered metrics. The code is available at https://github.com/yazidjanati/cfgig
Problem

Research questions and friction points this paper is trying to address.

Improves conditional diffusion models by addressing quality-diversity trade-off
Identifies missing Rényi divergence term in Classifier-Free Guidance
Proposes Gibbs-like sampling for tilted distribution with enhanced diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Rényi divergence for CFG correction
Proposes Gibbs-like sampling for tilted distribution
Enhances sample quality and diversity simultaneously
🔎 Similar Papers
No similar papers found.