WiseEdit: Benchmarking Cognition- and Creativity-Informed Image Editing

📅 2025-11-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing evaluation benchmarks are overly narrow, failing to comprehensively assess models’ cognitive and creative capabilities. To address this, we propose WiseEdit—the first knowledge-intensive benchmark for image editing evaluation. Grounded in the human creative process—perception → understanding → imagination—it systematically integrates three categories of cognitive science knowledge: declarative, procedural, and metacognitive. WiseEdit employs a cognitively inspired, three-level cascaded task framework driven by analogical reasoning, coupled with knowledge-augmented test case construction, yielding 1,220 composite editing samples requiring deep multimodal reasoning. It enables fine-grained generative evaluation across semantic fidelity, structural coherence, and creative synthesis. Extensive experiments reveal significant deficiencies in current state-of-the-art models regarding knowledge-guided cognitive reasoning and combinatorial creativity, thereby validating WiseEdit’s effectiveness and rigor in evaluating higher-order image editing capabilities.

Technology Category

Application Category

📝 Abstract
Recent image editing models boast next-level intelligent capabilities, facilitating cognition- and creativity-informed image editing. Yet, existing benchmarks provide too narrow a scope for evaluation, failing to holistically assess these advanced abilities. To address this, we introduce WiseEdit, a knowledge-intensive benchmark for comprehensive evaluation of cognition- and creativity-informed image editing, featuring deep task depth and broad knowledge breadth. Drawing an analogy to human cognitive creation, WiseEdit decomposes image editing into three cascaded steps, i.e., Awareness, Interpretation, and Imagination, each corresponding to a task that poses a challenge for models to complete at the specific step. It also encompasses complex tasks, where none of the three steps can be finished easily. Furthermore, WiseEdit incorporates three fundamental types of knowledge: Declarative, Procedural, and Metacognitive knowledge. Ultimately, WiseEdit comprises 1,220 test cases, objectively revealing the limitations of SoTA image editing models in knowledge-based cognitive reasoning and creative composition capabilities. The benchmark, evaluation code, and the generated images of each model will be made publicly available soon. Project Page: https://qnancy.github.io/wiseedit_project_page/.
Problem

Research questions and friction points this paper is trying to address.

Introduces WiseEdit benchmark for evaluating cognition- and creativity-informed image editing
Decomposes editing into Awareness, Interpretation, and Imagination steps for assessment
Reveals limitations of state-of-the-art models in knowledge-based reasoning and creativity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces WiseEdit benchmark for comprehensive image editing evaluation
Decomposes editing into Awareness, Interpretation, Imagination steps
Incorporates Declarative, Procedural, Metacognitive knowledge types
🔎 Similar Papers
No similar papers found.