EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Text-guided image editing lacks a systematic evaluation framework. Method: We introduce the first human-annotated, fine-grained benchmark covering six dimensions—accuracy, artifact presence, scene coherence, commonsense consistency, change description fidelity, and local preservation—and propose a multi-dimensional human validation template. To enhance artifact detection and edit difference characterization while mitigating large-model hallucination, we design a difference-aware modeling approach and a generative change description method. Our evaluation integrates vision-language models with a human annotation framework. Contribution/Results: Experiments reveal pervasive hallucination and dimensional bias across current SOTA models. Our method achieves state-of-the-art performance on both artifact detection and change description tasks, establishing a new paradigm for evaluating text-guided image editing.

Technology Category

Application Category

📝 Abstract

Text-guided image editing, fueled by recent advancements in generative AI, is becoming increasingly widespread. This trend highlights the need for a comprehensive framework to verify text-guided edits and assess their quality. To address this need, we introduce EditInspector, a novel benchmark for evaluation of text-guided image edits, based on human annotations collected using an extensive template for edit verification. We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits across various dimensions, including accuracy, artifact detection, visual quality, seamless integration with the image scene, adherence to common sense, and the ability to describe edit-induced changes. Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes. To address these challenges, we propose two novel methods that outperform SoTA models in both artifact detection and difference caption generation.

Problem

Research questions and friction points this paper is trying to address.

Lack of framework to verify text-guided image edits

Current models fail to evaluate edits comprehensively

Models often hallucinate when describing edit changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

EditInspector benchmark for text-guided image edits

Human annotations with extensive edit verification template

Novel methods outperform SoTA in artifact detection

🔎 Similar Papers

No similar papers found.