Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image (T2I) models frequently suffer from unintended information flow among text tokens, leading to semantic leakage, feature misbinding, and omission of key concepts. This work systematically identifies cross-phrase semantic leakage and token-level representation redundancy, and proposes the first training-free, token-level intervention method: selective masking and resetting of contextualized text representations to precisely regulate information flow. Our approach performs token-level information-flow diagnosis within diffusion models, leveraging comparative analysis between contextualized and non-contextualized embeddings. Experiments across multiple mainstream T2I models demonstrate a 21% reduction in image generation error rate and an 85% decrease in semantic leakage, validating both effectiveness and generalizability. The core contribution lies in establishing a causal link between token-level information flow and generation failures, and introducing a lightweight, plug-and-play representation modulation paradigm.

Technology Category

Application Category

📝 Abstract
Text-to-Image (T2I) models often suffer from issues such as semantic leakage, incorrect feature binding, and omissions of key concepts in the generated image. This work studies these phenomena by looking into the role of information flow between textual token representations. To this end, we generate images by applying the diffusion component on a subset of contextual token representations in a given prompt and observe several interesting phenomena. First, in many cases, a word or multiword expression is fully represented by one or two tokens, while other tokens are redundant. For example, in"San Francisco's Golden Gate Bridge", the token"gate"alone captures the full expression. We demonstrate the redundancy of these tokens by removing them after textual encoding and generating an image from the resulting representation. Surprisingly, we find that this process not only maintains image generation performance but also reduces errors by 21% compared to standard generation. We then show that information can also flow between different expressions in a sentence, which often leads to semantic leakage. Based on this observation, we propose a simple, training-free method to mitigate semantic leakage: replacing the leaked item's representation after the textual encoding with its uncontextualized representation. Remarkably, this simple approach reduces semantic leakage by 85%. Overall, our work provides a comprehensive analysis of information flow across textual tokens in T2I models, offering both novel insights and practical benefits.
Problem

Research questions and friction points this paper is trying to address.

Analyzes information flow between textual tokens in T2I models
Identifies token redundancy and improves image generation accuracy
Proposes method to reduce semantic leakage by 85%
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes token redundancy in text encoding
Reduces errors by removing redundant tokens
Mitigates semantic leakage via uncontextualized representations
🔎 Similar Papers
No similar papers found.