🤖 AI Summary
Existing image editing evaluation benchmarks are overly narrow, failing to comprehensively assess models’ cognitive and creative capabilities. To address this, we propose WiseEdit—the first knowledge-intensive benchmark for image editing evaluation. Grounded in the human creative process—perception → understanding → imagination—it systematically integrates three categories of cognitive science knowledge: declarative, procedural, and metacognitive. WiseEdit employs a cognitively inspired, three-level cascaded task framework driven by analogical reasoning, coupled with knowledge-augmented test case construction, yielding 1,220 composite editing samples requiring deep multimodal reasoning. It enables fine-grained generative evaluation across semantic fidelity, structural coherence, and creative synthesis. Extensive experiments reveal significant deficiencies in current state-of-the-art models regarding knowledge-guided cognitive reasoning and combinatorial creativity, thereby validating WiseEdit’s effectiveness and rigor in evaluating higher-order image editing capabilities.
📝 Abstract
Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.