HP-Edit: A Human-Preference Post-Training Framework for Image Editing

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current diffusion-based image editing methods lack scalable human preference data and a dedicated alignment training framework. This work proposes HP-Edit, the first preference-aligned post-training paradigm tailored for image editing. It begins by constructing an automated evaluator, HP-Scorer, leveraging a small set of human preference ratings and a pretrained vision-language model. HP-Scorer is then used to generate RealPref-50K, a large-scale preference dataset that serves as a reward signal in reinforcement learning–based post-training. The proposed approach significantly enhances the alignment between edited images and human preferences, achieving strong performance on the RealPref-Bench benchmark and effectively improving the real-world editing quality of models such as Qwen-Image-Edit-2509.

Technology Category

Application Category

📝 Abstract

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning from Human Feedback

diffusion-based image editing

human preference alignment

preference dataset

image editing framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human Preference

Reinforcement Learning from Human Feedback (RLHF)

Diffusion Models