EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
Existing text-guided image editing methods often suffer from semantic drift or unreliable local corrections due to fine-grained artifacts such as object distortion, lighting mismatch, or unintended modifications. To address this, this work introduces EditFHF-15K, a dataset enriched with fine-grained human feedback, and proposes EditRefiner—a novel framework that pioneers a four-agent collaborative mechanism grounded in human feedback. This mechanism emulates a perception–reasoning–action–evaluation loop to enable precise localization, diagnosis, and localized re-editing of editing errors. Integrating context-aware saliency perception, diagnostic reasoning, re-editing planning, and multi-dimensional quality assessment, EditRefiner significantly outperforms existing approaches in distortion localization accuracy, diagnostic correctness, and alignment with human perceptual judgments, thereby establishing a new paradigm for reliable and self-correcting image editing.
📝 Abstract
Recent text-guided image editing (TIE) models have made remarkable progress, yet edited images still frequently suffer from fine-grained issues such as unnatural objects, lighting mismatch, and unexpected changes. Existing refinement approaches either rely on costly iterative regeneration or employ vision-language models (VLMs) with weak spatial grounding, often resulting in semantic drift and unreliable local corrections. To address these limitations, we first construct EditFHF-15K, a dataset of fine-grained human feedback for edited images, comprising (1) 15K images from 12 TIE models spanning 43 editing tasks, (2) 60K annotated artifact regions and 80K editing failure regions, each accompanied by textual reasoning, and (3) 45K mean opinion scores (MOSs) assessing perceptual quality, instruction following, and visual consistency. Based on EditFHF-15K, we propose EditRefiner, a hierarchical, interpretable, and human-aligned agentic framework that reformulates post-editing correction as a human-like perception-reasoning-action-evaluation loop. Specifically, we introduce: (1) a perception agent that detects contextual saliency maps of artifacts and editing failures, (2) a reasoning agent that interprets these perceptual cues to perform human-aligned diagnostic inference, (3) an action agent that uses the reasoning output to plan and execute localized re-editing, and (4) an evaluation agent that assesses the re-edited image and guides the action agent on whether further refinements are required. Extensive experiments demonstrate that EditRefiner consistently outperforms state-of-the-art methods in distortion localization, diagnose accuracy and human perception alignment, establishing a new paradigm for self-corrective and perceptually reliable image editing. The code is available at https://github.com/IntMeGroup/EditRefiner.
Problem

Research questions and friction points this paper is trying to address.

text-guided image editing
image editing refinement
fine-grained artifacts
semantic drift
spatial grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

human-aligned refinement
agentic framework
fine-grained feedback
perception-reasoning-action loop
localized re-editing