LMM4Edit: Benchmarking and Evaluating Multimodal Image Editing with LMMs

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-guided image editing (TIE) models struggle to simultaneously achieve high image fidelity, precise edit alignment, and strong preservation of original content. Moreover, large-scale, high-fidelity human preference benchmarks for systematic evaluation remain absent. To address these limitations, we introduce EBench-18K—the first comprehensive, 18K-scale TIE benchmark featuring fine-grained human preference annotations. We further propose LMM4Edit, the first unified, multimodal large language model (MLLM)-based automatic evaluation metric for TIE, jointly modeling four critical dimensions: image quality, edit alignment, attribute preservation, and question-answering accuracy. LMM4Edit enables end-to-end, MLLM-driven TIE assessment for the first time, achieving strong correlation with human judgments (Spearman’s ρ > 0.85) and demonstrating robust zero-shot transferability—consistently outperforming existing metrics across diverse datasets. Both the benchmark data and evaluation code are publicly released.

Technology Category

Application Category

📝 Abstract
The rapid advancement of Text-guided Image Editing (TIE) enables image modifications through text prompts. However, current TIE models still struggle to balance image quality, editing alignment, and consistency with the original image, limiting their practical applications. Existing TIE evaluation benchmarks and metrics have limitations on scale or alignment with human perception. To this end, we introduce EBench-18K, the first large-scale image Editing Benchmark including 18K edited images with fine-grained human preference annotations for evaluating TIE. Specifically, EBench-18K includes 1,080 source images with corresponding editing prompts across 21 tasks, 18K+ edited images produced by 17 state-of-the-art TIE models, 55K+ mean opinion scores (MOSs) assessed from three evaluation dimensions, and 18K+ question-answering (QA) pairs. Based on EBench-18K, we employ outstanding LMMs to assess edited images, while the evaluation results, in turn, provide insights into assessing the alignment between the LMMs' understanding ability and human preferences. Then, we propose LMM4Edit, a LMM-based metric for evaluating image Editing models from perceptual quality, editing alignment, attribute preservation, and task-specific QA accuracy in an all-in-one manner. Extensive experiments show that LMM4Edit achieves outstanding performance and aligns well with human preference. Zero-shot validation on the other datasets also shows the generalization ability of our model. The dataset and code are available at https://github.com/IntMeGroup/LMM4Edit.
Problem

Research questions and friction points this paper is trying to address.

Balancing image quality, editing alignment, and consistency in TIE models
Addressing limitations in current TIE evaluation benchmarks and metrics
Aligning LMMs' understanding ability with human preferences for image editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces EBench-18K benchmark for TIE evaluation
Proposes LMM4Edit for all-in-one image editing assessment
Uses LMMs to align evaluation with human preferences
🔎 Similar Papers
No similar papers found.
Zitong Xu
Zitong Xu
Shanghai Jiao Tong University
Image Quality AssessmentImage Editing
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
B
Bingnan Liu
University of Electronic and Science Technology of China
G
Guangji Ma
University of Electronic and Science Technology of China
J
Jiarui Wang
Institute of Image Communication and Network Engineering, Shanghai JiaoTong University
L
Liu Yang
Institute of Image Communication and Network Engineering, Shanghai JiaoTong University
Shiqi Gao
Shiqi Gao
Beihang University
X
Xiaoyu Wang
University of Electronic and Science Technology of China
J
Jia Wang
Institute of Image Communication and Network Engineering, Shanghai JiaoTong University
X
Xiongkuo Min
Institute of Image Communication and Network Engineering, Shanghai JiaoTong University
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays
Weisi Lin
Weisi Lin
President's Chair Professor in Computer Science, CCDS, Nanyang Technological Unversity
Perception-inspired signal modelingperceptual multimedia quality evaluationvideo compressionimage processing & analysis