S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Classifier-free guidance (CFG) in diffusion models often degrades semantic coherence and generation quality due to its reliance on fixed, deterministic model outputs. Method: We propose S²-Guidance—a training-free guidance strategy that exploits the inherent stochastic subnetwork structure within diffusion models. Specifically, it dynamically applies Gaussian mixture model–driven random block dropping during the forward process to construct lightweight, variable subnetworks, enabling adaptive steering of the denoising trajectory. Contribution/Results: Unlike CFG, S²-Guidance eliminates dependence on static model outputs and requires no auxiliary classifiers, additional training, or external models. Evaluated on text-to-image and text-to-video generation, it significantly improves generation fidelity, prompt adherence, and semantic coherence while demonstrating strong generalization across diverse prompts and architectures.

Technology Category

Application Category

📝 Abstract
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.
Problem

Research questions and friction points this paper is trying to address.

Improving diffusion model output quality
Reducing semantic incoherence in predictions
Enhancing prompt adherence without training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic block-dropping for sub-networks
Enhances diffusion models without training
Improves output quality and semantic coherence
🔎 Similar Papers
No similar papers found.