🤖 AI Summary
Current diffusion-based image editing methods lack scalable human preference data and a dedicated alignment training framework. This work proposes HP-Edit, the first preference-aligned post-training paradigm tailored for image editing. It begins by constructing an automated evaluator, HP-Scorer, leveraging a small set of human preference ratings and a pretrained vision-language model. HP-Scorer is then used to generate RealPref-50K, a large-scale preference dataset that serves as a reward signal in reinforcement learning–based post-training. The proposed approach significantly enhances the alignment between edited images and human preferences, achieving strong performance on the RealPref-Bench benchmark and effectively improving the real-world editing quality of models such as Qwen-Image-Edit-2509.
📝 Abstract
Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.