KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

While existing image editing models excel at instruction-driven tasks, their knowledge reasoning capabilities remain inadequately evaluated. Method: We introduce KRIS-Bench—the first benchmark dedicated to knowledge reasoning in image editing—covering factual, conceptual, and procedural knowledge across 22 tasks and 1,267 high-quality samples. We propose a cognition-inspired knowledge taxonomy, define “Knowledge Plausibility” as a novel evaluation metric, and integrate knowledge-aware prompting, multi-dimensional human annotation, human-AI collaborative calibration, and human-factor experiments for rigorous assessment. Contribution/Results: Comprehensive evaluation of 10 state-of-the-art models reveals substantial deficiencies in knowledge reasoning, underscoring the critical need for knowledge-centered evaluation to advance intelligent image editing. KRIS-Bench establishes a foundational framework for systematic, interpretable, and human-aligned assessment of knowledge-infused visual generation.

Technology Category

Application Category

📝 Abstract

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.

Problem

Research questions and friction points this paper is trying to address.

Assessing knowledge-based reasoning in image editing models

Creating a benchmark for intelligent image editing tasks

Evaluating models on factual, conceptual, and procedural knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces KRIS-Bench for knowledge-based image editing

Categorizes tasks into Factual, Conceptual, Procedural knowledge

Proposes Knowledge Plausibility metric for fine-grained evaluation

🔎 Similar Papers

GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization