Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Classifier-free guidance (CFG) in diffusion models suffers from guidance collapse under few-step sampling, primarily due to severe divergence between positive and negative prediction branches. Method: We propose Normalized Attention Guidance (NAG), a training-free, plug-and-play inference-time intervention that applies L1-norm normalization and adaptive refinement within the attention space of UNet or DiT architectures. Contribution/Results: NAG is the first method to enable universal negative guidance across architectures (UNet/DiT), modalities (image/video), and sampling schedules (multi-step/few-step), fully circumventing CFG’s branch divergence. Experiments demonstrate significant improvements in CLIP Score and ImageReward, substantial reductions in FID and PFID, and superior human evaluation ratings. NAG incurs negligible computational overhead, requires zero training, and maintains full compatibility with mainstream diffusion frameworks.

Technology Category

Application Category

📝 Abstract

Negative guidance -- explicitly suppressing unwanted attributes -- remains a fundamental challenge in diffusion models, particularly in few-step sampling regimes. While Classifier-Free Guidance (CFG) works well in standard settings, it fails under aggressive sampling step compression due to divergent predictions between positive and negative branches. We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement. NAG restores effective negative guidance where CFG collapses while maintaining fidelity. Unlike existing approaches, NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video), functioning as a extit{universal} plug-in with minimal computational overhead. Through extensive experimentation, we demonstrate consistent improvements in text alignment (CLIP Score), fidelity (FID, PFID), and human-perceived quality (ImageReward). Our ablation studies validate each design component, while user studies confirm significant preference for NAG-guided outputs. As a model-agnostic inference-time approach requiring no retraining, NAG provides effortless negative guidance for all modern diffusion frameworks -- pseudocode in the Appendix!

Problem

Research questions and friction points this paper is trying to address.

Addresses ineffective negative guidance in diffusion models

Improves text alignment and fidelity in few-step sampling

Provides universal plug-in for various architectures and modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalized Attention Guidance for diffusion models

L1-based normalization and refinement in attention space

Universal plug-in across architectures and modalities

🔎 Similar Papers

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts