EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Open-source image editing models suffer from insufficient synthetic training data quality due to the absence of high-fidelity reward models aligned with human preferences. To address this, we propose EditReward—the first high-accuracy reward model specifically designed for natural language–guided image editing. It is trained on over 200,000 expert-annotated human preference pairs and incorporates a VLM-as-judge comparative evaluation paradigm to enhance discriminative capability. EditReward achieves state-of-the-art correlation with human judgments on established benchmarks including GenAI-Bench and AURORA-Bench. Furthermore, leveraging EditReward to curate and optimize a subset of ShareGPT-4o-Image, we train Step1X-Edit—a novel image editing model that significantly outperforms baselines trained on the full dataset. This advancement substantially elevates the performance ceiling of open-source image editing systems.

Technology Category

Application Category

📝 Abstract

Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built mname, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. mname demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that mname achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new enchname, outperforming a wide range of VLM-as-judge models. Furthermore, we use mname to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates mname's ability to serve as a reward model to scale up high-quality training data for image editing. Furthermore, its strong alignment suggests potential for advanced applications like reinforcement learning-based post-training and test-time scaling of image editing models. mname with its training dataset will be released to help the community build more high-quality image editing training datasets.

Problem

Research questions and friction points this paper is trying to address.

Develops reward model for instruction-guided image editing alignment

Addresses lack of reliable evaluation for open-source image editing

Creates human-preference dataset to scale training data quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human preference dataset with 200K expert-annotated pairs

Reward model aligning with human image editing preferences

Selecting high-quality training data subsets for image editing

🔎 Similar Papers

EmoEdit: Evoking Emotions through Image Manipulation