FireRed-Image-Edit-1.0 Techinical Report

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of weak semantic alignment, limited controllability, and insufficient generalization in instruction-driven image editing by proposing a systematic optimization framework. Leveraging a high-quality training corpus of 1.6 billion samples and a diffusion Transformer architecture, the approach integrates multi-stage training—encompassing pretraining, supervised fine-tuning, and reinforcement learning—with several novel techniques: a multi-condition-aware bucket sampler, stochastic instruction alignment, asymmetric gradient DPO, DiffusionNFT, differentiable consistency loss, and a layout-aware OCR reward. The authors also introduce REDEdit-Bench, a comprehensive benchmark covering 15 editing tasks. The resulting model consistently outperforms existing open- and closed-source systems across REDEdit-Bench, ImgEdit, and GEdit. All code, models, and evaluation suites are publicly released.

Technology Category

Application Category

📝 Abstract
We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt re-indexing. To stabilize optimization and enhance controllability, we propose Asymmetric Gradient Optimization for DPO, DiffusionNFT with layout-aware OCR rewards for text editing, and a differentiable Consistency Loss for identity preservation. We further establish REDEdit-Bench, a comprehensive benchmark spanning 15 editing categories, including newly introduced beautification and low-level enhancement tasks. Extensive experiments on REDEdit-Bench and public benchmarks (ImgEdit and GEdit) demonstrate competitive or superior performance against both open-source and proprietary systems. We release code, models, and the benchmark suite to support future research.
Problem

Research questions and friction points this paper is trying to address.

instruction-based image editing
diffusion transformer
image editing benchmark
semantic alignment
controllable image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion transformer
instruction-based image editing
multi-stage training pipeline
asymmetric gradient optimization
REDEdit-Bench
🔎 Similar Papers
No similar papers found.
S
Super Intelligence Team
Xiaohongshu Inc.
C
Changhao Qiao
Chao Hui
Chao Hui
American Battery Factory
Materials ScienceNanotechnology
C
Chen Li
C
Cunzheng Wang
D
Dejia Song
J
Jiale Zhang
J
Jing Li
Q
Qiang Xiang
Runqi Wang
Runqi Wang
Beijing Jiaotong University
Few-Shot LearningContinual LearningMuti-Modal
S
Shuang Sun
W
Wei Zhu
X
Xu Tang
Yao Hu
Yao Hu
浙江大学
Machine Learning
Y
Yibo Chen
Yuhao Huang
Yuhao Huang
Shenzhen University
Medical Image ComputingUltrasoundModel Robustness
Y
Yuxuan Duan
Z
Zhiyi Chen
Z
Ziyuan Guo