Guiding Diffusion Models with Semantically Degraded Conditions

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Classifier-Free Guidance (CFG) suffers from entangled geometric and semantic representations in complex compositional tasks due to its reliance on semantically vacuous null prompts, which limits generation fidelity. This work proposes Conditional Degradation Guidance (CDG), which replaces the null prompt with a semantically partially degraded condition, shifting the guidance paradigm from “good versus empty” to “good versus nearly good” to enhance control precision. By analyzing the functional differentiation between content and context tokens in Transformer-based text encoders, CDG selectively degrades content tokens to construct adaptive negative samples, yielding a plug-and-play method that requires no additional models or training. Evaluated across architectures including Stable Diffusion 3, FLUX, and Qwen-Image, CDG consistently improves compositional generation accuracy and image-text alignment with negligible computational overhead.

Technology Category

Application Category

📝 Abstract
Classifier-Free Guidance (CFG) is a cornerstone of modern text-to-image models, yet its reliance on a semantically vacuous null prompt ($\varnothing$) generates a guidance signal prone to geometric entanglement. This is a key factor limiting its precision, leading to well-documented failures in complex compositional tasks. We propose Condition-Degradation Guidance (CDG), a novel paradigm that replaces the null prompt with a strategically degraded condition, $\boldsymbol{c}_{\text{deg}}$. This reframes guidance from a coarse "good vs. null" contrast to a more refined "good vs. almost good" discrimination, thereby compelling the model to capture fine-grained semantic distinctions. We find that tokens in transformer text encoders split into two functional roles: content tokens encoding object semantics, and context-aggregating tokens capturing global context. By selectively degrading only the former, CDG constructs $\boldsymbol{c}_{\text{deg}}$ without external models or training. Validated across diverse architectures including Stable Diffusion 3, FLUX, and Qwen-Image, CDG markedly improves compositional accuracy and text-image alignment. As a lightweight, plug-and-play module, it achieves this with negligible computational overhead. Our work challenges the reliance on static, information-sparse negative samples and establishes a new principle for diffusion guidance: the construction of adaptive, semantically-aware negative samples is critical to achieving precise semantic control. Code is available at https://github.com/Ming-321/Classifier-Degradation-Guidance.
Problem

Research questions and friction points this paper is trying to address.

Classifier-Free Guidance
semantic degradation
text-to-image generation
compositional accuracy
diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Condition-Degradation Guidance
semantic degradation
diffusion guidance
compositional generation
text-to-image alignment
🔎 Similar Papers
No similar papers found.
S
Shilong Han
College of Science, National University of Defense Technology
Yuming Zhang
Yuming Zhang
University of Kentucky
weldsensormodelingcontrolrobot
H
Hongxia Wang
College of Science, National University of Defense Technology