EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Instruction-guided image editing suffers from poor responsiveness to complex instructions and heavy reliance on iterative trial-and-error. Method: We introduce EditReward-Bench, a novel evaluation benchmark, and the EditScore family of reward models—establishing, for the first time, a complete closed-loop pipeline from high-fidelity reward modeling to online reinforcement learning (RL) optimization. Leveraging fine-grained data construction, rigorous filtering, and generation-oriented self-ensembling strategies, EditScore achieves evaluation accuracy on par with—or surpassing—that of GPT-5 in assessing editing quality. Contribution/Results: When integrated into RL training, EditScore significantly enhances OmniGen2’s instruction-following capability and editing accuracy. This demonstrates the critical role of domain-specific reward models in advancing RL-based image editing systems.

Technology Category

Application Category

📝 Abstract
Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
Problem

Research questions and friction points this paper is trying to address.

Developing high-fidelity reward models for instruction-guided image editing quality
Overcoming lack of effective reward signals for online reinforcement learning
Enabling robust policy optimization through specialized image editing evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed EditScore reward models for image editing
Introduced EditReward-Bench benchmark for evaluation
Enabled online RL training with high-fidelity rewards
🔎 Similar Papers
No similar papers found.
Xin Luo
Xin Luo
University of Science and Technology of China
Computer Vision
J
Jiahao Wang
Beijing Academy of Artificial Intelligence
C
Chenyuan Wu
University of Science and Technology of China
Shitao Xiao
Shitao Xiao
BUPT
X
Xiyan Jiang
Zhejiang University
D
Defu Lian
University of Science and Technology of China
Jiajun Zhang
Jiajun Zhang
Institute of Automation Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelsMultimodal Information Processing
D
Dong Liu
University of Science and Technology of China
Z
Zheng Liu
Beijing Academy of Artificial Intelligence