S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Classifier-free guidance (CFG) in diffusion models often degrades semantic coherence and generation quality due to its reliance on fixed, deterministic model outputs. Method: We propose S²-Guidance—a training-free guidance strategy that exploits the inherent stochastic subnetwork structure within diffusion models. Specifically, it dynamically applies Gaussian mixture model–driven random block dropping during the forward process to construct lightweight, variable subnetworks, enabling adaptive steering of the denoising trajectory. Contribution/Results: Unlike CFG, S²-Guidance eliminates dependence on static model outputs and requires no auxiliary classifiers, additional training, or external models. Evaluated on text-to-image and text-to-video generation, it significantly improves generation fidelity, prompt adherence, and semantic coherence while demonstrating strong generalization across diverse prompts and architectures.

Technology Category

Application Category

📝 Abstract

Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However, through an empirical analysis on Gaussian mixture modeling with a closed-form solution, we observe a discrepancy between the suboptimal results produced by CFG and the ground truth. The model's excessive reliance on these suboptimal predictions often leads to semantic incoherence and low-quality outputs. To address this issue, we first empirically demonstrate that the model's suboptimal predictions can be effectively refined using sub-networks of the model itself. Building on this insight, we propose S^2-Guidance, a novel method that leverages stochastic block-dropping during the forward process to construct stochastic sub-networks, effectively guiding the model away from potential low-quality predictions and toward high-quality outputs. Extensive qualitative and quantitative experiments on text-to-image and text-to-video generation tasks demonstrate that S^2-Guidance delivers superior performance, consistently surpassing CFG and other advanced guidance strategies. Our code will be released.

Problem

Research questions and friction points this paper is trying to address.

Improving diffusion model output quality

Reducing semantic incoherence in predictions

Enhancing prompt adherence without training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic block-dropping for sub-networks

Enhances diffusion models without training

Improves output quality and semantic coherence

🔎 Similar Papers

No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models