EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-source image editing models suffer from insufficient synthetic training data quality due to the absence of high-fidelity reward models aligned with human preferences. To address this, we propose EditReward—the first high-accuracy reward model specifically designed for natural language–guided image editing. It is trained on over 200,000 expert-annotated human preference pairs and incorporates a VLM-as-judge comparative evaluation paradigm to enhance discriminative capability. EditReward achieves state-of-the-art correlation with human judgments on established benchmarks including GenAI-Bench and AURORA-Bench. Furthermore, leveraging EditReward to curate and optimize a subset of ShareGPT-4o-Image, we train Step1X-Edit—a novel image editing model that significantly outperforms baselines trained on the full dataset. This advancement substantially elevates the performance ceiling of open-source image editing systems.

Technology Category

Application Category

📝 Abstract
Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built mname, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. mname demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that mname achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new enchname, outperforming a wide range of VLM-as-judge models. Furthermore, we use mname to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates mname's ability to serve as a reward model to scale up high-quality training data for image editing. Furthermore, its strong alignment suggests potential for advanced applications like reinforcement learning-based post-training and test-time scaling of image editing models. mname with its training dataset will be released to help the community build more high-quality image editing training datasets.
Problem

Research questions and friction points this paper is trying to address.

Develops reward model for instruction-guided image editing alignment
Addresses lack of reliable evaluation for open-source image editing
Creates human-preference dataset to scale training data quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human preference dataset with 200K expert-annotated pairs
Reward model aligning with human image editing preferences
Selecting high-quality training data subsets for image editing
🔎 Similar Papers
No similar papers found.