EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing

๐Ÿ“… 2026-03-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of current text-guided image editing models, which often suffer from artifacts, instruction misalignment, or suboptimal aesthetics, and lack a scalable human preference evaluation framework. The authors introduce EditHF-1M, the first million-scale, fine-grained human preference dataset comprising 29 million pairwise comparisons and 148,000 mean opinion scores. Building upon this, they propose a general-purpose evaluation and reward mechanism based on multimodal large language models: they train an EditHF model to align with human preferences and develop a reinforcement learningโ€“based reward model, EditHF-Reward. This approach jointly optimizes editing quality, instruction fidelity, and attribute preservation, demonstrating strong cross-dataset generalization and alignment with human judgments. Fine-tuning Qwen-Image-Edit with this reward model significantly enhances editing performance, validating the effectiveness and scalability of the proposed framework.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent text-guided image editing (TIE) models have achieved remarkable progress, while many edited images still suffer from issues such as artifacts, unexpected editings, unaesthetic contents. Although some benchmarks and methods have been proposed for evaluating edited images, scalable evaluation models are still lacking, which limits the development of human feedback reward models for image editing. To address the challenges, we first introduce \textbf{EditHF-1M}, a million-scale image editing dataset with over 29M human preference pairs and 148K human mean opinion ratings, both evaluated from three dimensions, \textit{i.e.}, visual quality, instruction alignment, and attribute preservation. Based on EditHF-1M, we propose \textbf{EditHF}, a multimodal large language model (MLLM) based evaluation model, to provide human-aligned feedback from image editing. Finally, we introduce \textbf{EditHF-Reward}, which utilizes EditHF as the reward signal to optimize the text-guided image editing models through reinforcement learning. Extensive experiments show that EditHF achieves superior alignment with human preferences and demonstrates strong generalization on other datasets. Furthermore, we fine-tune the Qwen-Image-Edit using EditHF-Reward, achieving significant performance improvements, which demonstrates the ability of EditHF to serve as a reward model to scale-up the image editing. Both the dataset and code will be released in our GitHub repository: https://github.com/IntMeGroup/EditHF.
Problem

Research questions and friction points this paper is trying to address.

image editing
human preference
evaluation model
text-guided editing
reward modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

human preference feedback
image editing evaluation
multimodal large language model
reinforcement learning reward
instruction alignment
๐Ÿ”Ž Similar Papers
No similar papers found.