Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative AI struggles to model the abstract logic underlying visual metaphors, often producing only pixel-level reproductions without genuine metaphorical creativity. To address this limitation, this work introduces the Visual Metaphor Transfer (VMT) task and formalizes conceptual blending theory into a Schema Grammar (G) that structurally captures cross-domain relational invariance. A cognitively inspired multi-agent system—comprising perception, transfer, generation, and hierarchical diagnostic modules—is integrated with a closed-loop backtracking optimization mechanism to effectively decouple and transfer the “creative essence” from reference images into target domains. Experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art methods in metaphorical consistency, analogical appropriateness, and visual creativity, receiving strong endorsement in human evaluations and showing high applicability in high-value creative contexts such as advertising and media.

Technology Category

Application Category

📝 Abstract
A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric. Despite the remarkable progress of generative AI, existing models remain largely confined to pixel-level instruction alignment and surface-level appearance preservation, failing to capture the underlying abstract logic necessary for genuine metaphorical generation. To bridge this gap, we introduce the task of Visual Metaphor Transfer (VMT), which challenges models to autonomously decouple the"creative essence"from a reference image and re-materialize that abstract logic onto a user-specified target subject. We propose a cognitive-inspired, multi-agent framework that operationalizes Conceptual Blending Theory (CBT) through a novel Schema Grammar ("G"). This structured representation decouples relational invariants from specific visual entities, providing a rigorous foundation for cross-domain logic re-instantiation. Our pipeline executes VMT through a collaborative system of specialized agents: a perception agent that distills the reference into a schema, a transfer agent that maintains generic space invariance to discover apt carriers, a generation agent for high-fidelity synthesis and a hierarchical diagnostic agent that mimics a professional critic, performing closed-loop backtracking to identify and rectify errors across abstract logic, component selection, and prompt encoding. Extensive experiments and human evaluations demonstrate that our method significantly outperforms SOTA baselines in metaphor consistency, analogy appropriateness, and visual creativity, paving the way for automated high-impact creative applications in advertising and media. Source code will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Visual Metaphor Transfer
Abstract Logic
Cross-domain Semantic Fusion
Generative AI
Creative Essence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Metaphor Transfer
Schema Grammar
Conceptual Blending Theory
Multi-Agent Reasoning
Abstract Logic Decoupling
🔎 Similar Papers