CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the instability, overshooting, and degraded semantic fidelity commonly observed in Classifier-Free Guidance (CFG) when using large guidance scales. The authors reinterpret CFG as a control mechanism for first-order continuous-time generative flows, where the difference between conditional and unconditional predictions serves as an error signal to modulate the velocity field. Building on this perspective, they propose Sliding Mode Control CFG (SMC-CFG), which introduces nonlinear feedback correction through an exponential sliding surface and a switching control term. Leveraging Lyapunov stability analysis, they theoretically establish finite-time convergence of the proposed method. Experimental results on models such as Stable Diffusion 3.5, Flux, and Qwen-Image demonstrate that SMC-CFG significantly improves semantic alignment and enhances robustness across a wide range of guidance scales.

Technology Category

Application Category

📝 Abstract

Classifier-Free Guidance (CFG) has emerged as a central approach for enhancing semantic alignment in flow-based diffusion models. In this paper, we explore a unified framework called CFG-Ctrl, which reinterprets CFG as a control applied to the first-order continuous-time generative flow, using the conditional-unconditional discrepancy as an error signal to adjust the velocity field. From this perspective, we summarize vanilla CFG as a proportional controller (P-control) with fixed gain, and typical follow-up variants develop extended control-law designs derived from it. However, existing methods mainly rely on linear control, inherently leading to instability, overshooting, and degraded semantic fidelity especially on large guidance scales. To address this, we introduce Sliding Mode Control CFG (SMC-CFG), which enforces the generative flow toward a rapidly convergent sliding manifold. Specifically, we define an exponential sliding mode surface over the semantic prediction error and introduce a switching control term to establish nonlinear feedback-guided correction. Moreover, we provide a Lyapunov stability analysis to theoretically support finite-time convergence. Experiments across text-to-image generation models including Stable Diffusion 3.5, Flux, and Qwen-Image demonstrate that SMC-CFG outperforms standard CFG in semantic alignment and enhances robustness across a wide range of guidance scales. Project Page: https://hanyang-21.github.io/CFG-Ctrl

Problem

Research questions and friction points this paper is trying to address.

Classifier-Free Guidance

diffusion models

semantic alignment

control theory

stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Classifier-Free Guidance

Sliding Mode Control

Generative Flow