WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing image editing evaluation benchmarks are overly narrow, failing to comprehensively assess models’ cognitive and creative capabilities. To address this, we propose WiseEdit—the first knowledge-intensive benchmark for image editing evaluation. Grounded in the human creative process—perception → understanding → imagination—it systematically integrates three categories of cognitive science knowledge: declarative, procedural, and metacognitive. WiseEdit employs a cognitively inspired, three-level cascaded task framework driven by analogical reasoning, coupled with knowledge-augmented test case construction, yielding 1,220 composite editing samples requiring deep multimodal reasoning. It enables fine-grained generative evaluation across semantic fidelity, structural coherence, and creative synthesis. Extensive experiments reveal significant deficiencies in current state-of-the-art models regarding knowledge-guided cognitive reasoning and combinatorial creativity, thereby validating WiseEdit’s effectiveness and rigor in evaluating higher-order image editing capabilities.

Technology Category

Application Category

📝 Abstract

Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.

Problem

Research questions and friction points this paper is trying to address.

Introduces WiseEdit benchmark for evaluating cognition- and creativity-informed image editing

Decomposes editing into Awareness, Interpretation, and Imagination steps for assessment

Reveals limitations of state-of-the-art models in knowledge-based reasoning and creativity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces WiseEdit benchmark for comprehensive image editing evaluation

Decomposes editing into Awareness, Interpretation, Imagination steps

Incorporates Declarative, Procedural, Metacognitive knowledge types

🔎 Similar Papers

Using a CNN Model to Assess Paintings' Creativity