Orthogonal Negative Guidance in Attention Feature Space for Text-to-Image Generation

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the challenge of precisely suppressing specified objects or attributes in text-to-image generation. The authors propose a training-free orthogonal negative guidance method that operates in the attention output space of the MM-DiT architecture. By orthogonalizing negative prompt features with respect to positive prompt features and removing only the orthogonal component, the approach enables fine-grained concept suppression. This is the first method to introduce an orthogonal negative guidance mechanism within the attention feature space, supporting multi-concept suppression and adjustable suppression strength while preserving semantic consistency and high image fidelity. Experiments on FLUX-dev and FLUX-schnell demonstrate substantial improvements over existing techniques, achieving an 18.78% win rate in human evaluations and striking an effective balance among concept suppression, prompt alignment, and image quality.

📝 Abstract

Text-to-image (T2I) models have become increasingly capable of generating high-quality images. Yet, enforcing the explicit absence of a specified object or attribute remains a fundamentally challenging problem. Existing approaches, including prompt negation, post-hoc editing, and negative guidance, remain insufficient for explicit concept suppression, often failing to remove the target concept or degrading overall image quality. To this end, we propose Orthogonal Negative Guidance in attention feature space, a training-free method that operates in the attention output space of MM-DiT-based T2I transformers. Our method orthogonalizes negative-prompt attention features with respect to positive-prompt features and subtracts only the orthogonal component, suppressing unwanted concepts while preserving desired semantics. Experiments on FLUX-dev and FLUX-schnell show that our method achieves favorable trade-offs between concept suppression, prompt alignment, and image quality. In human evaluation, our method outperforms the second-best baseline by 18.78%. We further show that our method supports multi-concept suppression and adjustable concept suppression.

Problem

Research questions and friction points this paper is trying to address.

text-to-image generation

negative guidance

concept suppression

prompt negation

image quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Negative Guidance

attention feature space

text-to-image generation