🤖 AI Summary
To address the weak efficacy and high computational overhead of negative prompting in few-step image/video generation, this paper proposes Value Sign Flip (VSF): a training-free, low-overhead technique that dynamically flips the sign of value vectors from negative prompts in cross-attention layers to suppress undesired content. VSF is the first method to achieve negative guidance via attention-value sign manipulation, and it is natively compatible with MMDiT and general cross-attention architectures—enabling seamless integration into state-of-the-art models such as Stable Diffusion 3.5 Turbo and Wan, as well as ComfyUI. Experiments demonstrate that VSF significantly improves negative prompt adherence under both few-step and standard-step regimes, outperforming baselines like classifier-free guidance (CFG), while preserving generation quality. The implementation and ComfyUI plugin are publicly released.
📝 Abstract
We introduce Value Sign Flip (VSF), a simple and efficient method for incorporating negative prompt guidance in few-step diffusion and flow-matching image generation models. Unlike existing approaches such as classifier-free guidance (CFG), NASA, and NAG, VSF dynamically suppresses undesired content by flipping the sign of attention values from negative prompts. Our method requires only small computational overhead and integrates effectively with MMDiT-style architectures such as Stable Diffusion 3.5 Turbo, as well as cross-attention-based models like Wan. We validate VSF on challenging datasets with complex prompt pairs and demonstrate superior performance in both static image and video generation tasks. Experimental results show that VSF significantly improves negative prompt adherence compared to prior methods in few-step models, and even CFG in non-few-step models, while maintaining competitive image quality. Code and ComfyUI node are available in https://github.com/weathon/VSF/tree/main.