SeedEdit 3.0: Fast and High-Quality Generative Image Editing

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address critical challenges in generative editing of real-world images—including weak instruction following, low identity/content fidelity, and poor inference efficiency—this paper proposes a meta-information-driven framework for efficient, high-fidelity editing. Methodologically, we construct a meta-information paradigm dataset, design a learnable meta-embedding strategy, and formulate a joint optimization objective combining diffusion loss and reward-based loss. Built upon the Seedream 3.0 text-to-image backbone, our approach integrates multi-source hybrid training, vision-language model (VLM)-diffusion co-modeling, and reinforcement learning fine-tuning. Evaluated on both real and synthetic image editing benchmarks, our method achieves a usability rate of 56.1%, substantially outperforming SeedEdit 1.6 (38.4%), GPT-4o (37.1%), and Gemini 2.0 (30.3%). To our knowledge, this is the first work to simultaneously achieve precise instruction adherence, strong identity and IP consistency, and efficient inference in a unified framework.

Technology Category

Application Category

📝 Abstract

We introduce SeedEdit 3.0, in companion with our T2I model Seedream 3.0, which significantly improves over our previous SeedEdit versions in both aspects of edit instruction following and image content (e.g., ID/IP) preservation on real image inputs. Additional to model upgrading with T2I, in this report, we present several key improvements. First, we develop an enhanced data curation pipeline with a meta-info paradigm and meta-info embedding strategy that help mix images from multiple data sources. This allows us to scale editing data effectively, and meta information is helpfult to connect VLM with diffusion model more closely. Second, we introduce a joint learning pipeline for computing a diffusion loss and reward losses. Finally, we evaluate SeedEdit 3.0 on our testing benchmarks, for real/synthetic image editing, where it achieves a best trade-off between multiple aspects, yielding a high usability rate of 56.1%, compared to SeedEdit 1.6 (38.4%), GPT4o (37.1%) and Gemini 2.0 (30.3%).

Problem

Research questions and friction points this paper is trying to address.

Improves edit instruction following and image content preservation

Develops enhanced data curation pipeline for multi-source image mixing

Introduces joint learning pipeline for diffusion and reward losses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced data curation with meta-info embedding

Joint learning pipeline for diffusion and reward losses

Improved T2I model for better edit instruction following

🔎 Similar Papers

No similar papers found.