Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Text-to-image diffusion models struggle to suppress content strongly associated with specific concepts (e.g., “Charlie Chaplin” invariably generates “mustache”), as existing methods lack fine-grained intervention capability in the text embedding space to disentangle semantically entangled features. To address this, we propose Selective Semantic Disentanglement via Vectorization (SSDV), a method that introduces learnable incremental vectors into the cross-attention mechanism to selectively attenuate the semantic contribution of target tokens. These vectors are optimized via translation in the text embedding space and can be obtained zero-shot—without model fine-tuning or additional training. SSDV is the first approach enabling token-level content suppression in personalized T2I models, effectively mitigating strong semantic entanglement. Experiments demonstrate that SSDV outperforms state-of-the-art methods both quantitatively (lower FID, higher CLIP-Score) and qualitatively, especially in suppressing high-frequency co-occurring attributes.

Technology Category

Application Category

📝 Abstract

Text-to-Image (T2I) diffusion models have made significant progress in generating diverse high-quality images from textual prompts. However, these models still face challenges in suppressing content that is strongly entangled with specific words. For example, when generating an image of ``Charlie Chaplin", a ``mustache" consistently appears even if explicitly instructed not to include it, as the concept of ``mustache" is strongly entangled with ``Charlie Chaplin". To address this issue, we propose a novel approach to directly suppress such entangled content within the text embedding space of diffusion models. Our method introduces a delta vector that modifies the text embedding to weaken the influence of undesired content in the generated image, and we further demonstrate that this delta vector can be easily obtained through a zero-shot approach. Furthermore, we propose a Selective Suppression with Delta Vector (SSDV) method to adapt delta vector into the cross-attention mechanism, enabling more effective suppression of unwanted content in regions where it would otherwise be generated. Additionally, we enabled more precise suppression in personalized T2I models by optimizing delta vector, which previous baselines were unable to achieve. Extensive experimental results demonstrate that our approach significantly outperforms existing methods, both in terms of quantitative and qualitative metrics.

Problem

Research questions and friction points this paper is trying to address.

Suppress strongly entangled content in text-to-image diffusion models

Modify text embeddings to weaken undesired content influence

Enable precise suppression in personalized T2I generation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Delta vector modifies text embedding

Zero-shot approach obtains delta vector

SSDV method adapts cross-attention mechanism

🔎 Similar Papers

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts