Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Current unified models for text-to-image generation decouple understanding from generation, preventing their strong semantic comprehension capabilities from compensating for generative limitations during inference. To address this, we propose the “Understanding-injected Generation” (UiG) framework—the first to dynamically integrate a unified model’s understanding capacity into the generative process. UiG employs image editing as an intermediary, leveraging chain-of-thought prompting, edit instruction generation, and multi-step verification-and-refinement to progressively inject semantic understanding into generation. This breaks the conventional serial paradigm, enabling continuous semantic calibration of outputs. Evaluated on the TIIF long-prompt benchmark, UiG achieves a 3.92% improvement over state-of-the-art methods, demonstrating significantly enhanced complex semantic alignment and generation robustness.

Technology Category

Application Category

📝 Abstract

Recent works have made notable advancements in enhancing unified models for text-to-image generation through the Chain-of-Thought (CoT). However, these reasoning methods separate the processes of understanding and generation, which limits their ability to guide the reasoning of unified models in addressing the deficiencies of their generative capabilities. To this end, we propose a novel reasoning framework for unified models, Understanding-in-Generation (UiG), which harnesses the robust understanding capabilities of unified models to reinforce their performance in image generation. The core insight of our UiG is to integrate generative guidance by the strong understanding capabilities during the reasoning process, thereby mitigating the limitations of generative abilities. To achieve this, we introduce "Image Editing" as a bridge to infuse understanding into the generation process. Initially, we verify the generated image and incorporate the understanding of unified models into the editing instructions. Subsequently, we enhance the generated image step by step, gradually infusing the understanding into the generation process. Our UiG framework demonstrates a significant performance improvement in text-to-image generation over existing text-to-image reasoning methods, e.g., a 3.92% gain on the long prompt setting of the TIIF benchmark. The project code: https://github.com/QC-LY/UiG

Problem

Research questions and friction points this paper is trying to address.

Separating understanding and generation limits unified models' reasoning capabilities

Mitigating generative ability limitations through understanding-guided reasoning processes

Enhancing text-to-image generation by infusing understanding into generation steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates understanding capabilities into generation process

Uses Image Editing as bridge between understanding and generation

Enhances generated images step by step with understanding

🔎 Similar Papers

How Useful is Continued Pre-Training for Generative Unsupervised Domain Adaptation?