🤖 AI Summary
Existing image editing benchmarks suffer from narrow task coverage, coarse-grained evaluation metrics, and heavy reliance on manual annotations—limiting scalability and practical utility. To address these limitations, we propose the first comprehensive, fully automated benchmark for image-to-image editing, encompassing ten single- and multi-image editing tasks and thirty disentangled, fine-grained evaluation dimensions. Our method innovatively integrates domain-specific evaluation tools with large multimodal models (LMMs) to establish a multi-task, multi-dimensional, fully automated hybrid assessment framework, empirically validated to achieve high alignment with human preferences (Spearman’s ρ > 0.85). We systematically evaluate leading image editing models, uncovering inherent trade-offs among fidelity, consistency, and semantic controllability. All code, data, and evaluation tools are publicly released to foster reproducible research and community advancement.
📝 Abstract
Image editing models are advancing rapidly, yet comprehensive evaluation remains a significant challenge. Existing image editing benchmarks generally suffer from limited task scopes, insufficient evaluation dimensions, and heavy reliance on manual annotations, which significantly constrain their scalability and practical applicability. To address this, we propose extbf{I2I-Bench}, a comprehensive benchmark for image-to-image editing models, which features (i) diverse tasks, encompassing 10 task categories across both single-image and multi-image editing tasks, (ii) comprehensive evaluation dimensions, including 30 decoupled and fine-grained evaluation dimensions with automated hybrid evaluation methods that integrate specialized tools and large multimodal models (LMMs), and (iii) rigorous alignment validation, justifying the consistency between our benchmark evaluations and human preferences. Using I2I-Bench, we benchmark numerous mainstream image editing models, investigating the gaps and trade-offs between editing models across various dimensions. We will open-source all components of I2I-Bench to facilitate future research.