I2I-Bench: A Comprehensive Benchmark Suite for Image-to-Image Editing Models

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing benchmarks suffer from narrow task coverage, coarse-grained evaluation metrics, and heavy reliance on manual annotations—limiting scalability and practical utility. To address these limitations, we propose the first comprehensive, fully automated benchmark for image-to-image editing, encompassing ten single- and multi-image editing tasks and thirty disentangled, fine-grained evaluation dimensions. Our method innovatively integrates domain-specific evaluation tools with large multimodal models (LMMs) to establish a multi-task, multi-dimensional, fully automated hybrid assessment framework, empirically validated to achieve high alignment with human preferences (Spearman’s ρ > 0.85). We systematically evaluate leading image editing models, uncovering inherent trade-offs among fidelity, consistency, and semantic controllability. All code, data, and evaluation tools are publicly released to foster reproducible research and community advancement.

Technology Category

Application Category

📝 Abstract
Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations, which significantly constrain their scalability and practical applicability. To address this, we propose extbf{I2I-Bench}, a comprehensive benchmark for image-to-image editing models, which features (i) diverse tasks, encompassing 10 task categories across both single-image and multi-image editing tasks, (ii) comprehensive evaluation dimensions, including 30 decoupled and fine-grained evaluation dimensions with automated hybrid evaluation methods that integrate specialized tools and large multimodal models (LMMs), and (iii) rigorous alignment validation, justifying the consistency between our benchmark evaluations and human preferences. Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions. We will open-source all components of I2I-Bench to facilitate future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluates image editing models across diverse tasks
Automates assessment with hybrid tools and LMMs
Validates alignment between benchmark results and human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated hybrid evaluation with specialized tools and LMMs
Decoupled fine-grained evaluation across 30 dimensions
Comprehensive benchmark covering 10 diverse editing tasks
🔎 Similar Papers
No similar papers found.
Juntong Wang
Juntong Wang
Shanghai Jiao Tong University
VQALMMsRL
J
Jiarui Wang
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
Huiyu Duan
Huiyu Duan
Shanghai Jiao Tong University
Multimedia Signal Processing
J
Jiaxiang Kang
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
Guangtao Zhai
Professor, IEEE Fellow, Shanghai Jiao Tong University
Multimedia Signal ProcessingVisual Quality AssessmentQoEAI EvaluationDisplays
X
Xiongkuo Min
Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, Shanghai, China