CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented Metrics

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models for image morphing suffer from insufficient smoothness and semantic consistency, often yielding abrupt transitions or oversaturated artifacts. To address this, we propose a zero-shot image morphing framework: it caches multi-level features—downsampling, middle-layer, and upsampling blocks—via the DDIM inversion process, and introduces an adaptive cache injection mechanism to preserve structural integrity. Additionally, cross-domain shared semantic anchor prompts, generated by CLIP, guide the denoising process toward semantic alignment. We further design a novel global-local consistency scoring metric, enabling, for the first time, joint structural and semantic alignment without any training. Experiments on multiple benchmarks and user studies demonstrate significant improvements in morphing speed, visual naturalness, and semantic coherence, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Diffusion models exhibit remarkable generative ability, yet achieving smooth and semantically consistent image morphing remains a challenge. Existing approaches often yield abrupt transitions or over-saturated appearances due to the lack of adaptive structural and semantic alignments. We propose CHIMERA, a zero-shot diffusion-based framework that formulates morphing as a cached inversion-guided denoising process. To handle large semantic and appearance disparities, we propose Adaptive Cache Injection and Semantic Anchor Prompting. Adaptive Cache Injection (ACI) caches down, mid, and up blocks features from both inputs during DDIM inversion and re-injects them adaptively during denoising, enabling spatial and semantic alignment in depth- and time-adaptive manners and enabling natural feature fusion and smooth transitions. Semantic Anchor Prompting (SAP) leverages a vision-language model to generate a shared anchor prompt that serves as a semantic anchor, bridging dissimilar inputs and guiding the denoising process toward coherent results. Finally, we introduce the Global-Local Consistency Score (GLCS), a morphing-oriented metric that simultaneously evaluates the global harmonization of the two inputs and the smoothness of the local morphing transition. Extensive experiments and user studies show that CHIMERA achieves smoother and more semantically aligned transitions than existing methods, establishing a new state of the art in image morphing. The code and project page will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

Achieving smooth, semantically consistent image morphing with diffusion models
Addressing abrupt transitions and over-saturated appearances in morphing
Evaluating global harmonization and local smoothness in morphing transitions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Cache Injection for feature alignment
Semantic Anchor Prompting for semantic bridging
Global-Local Consistency Score for morphing evaluation
🔎 Similar Papers
No similar papers found.
D
Dahyeon Kye
Chung-Ang University
J
Jeahun Sung
Chung-Ang University
M
MinKyu Jeon
Princeton University
Jihyong Oh
Jihyong Oh
Assistant Prof. @ Chung-Ang Univ. (CAU), PhD/MS/BS @ KAIST
Computer VisionImage/Video ProcessingDeep LearningGen AI