CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models

📅 2025-04-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper addresses zero-shot image-to-camera-motion personalized video generation—transferring realistic camera motion from a reference video to an arbitrary user-specified scene without additional training data or fine-tuning. The method adopts a two-stage paradigm: (1) multi-concept LoRA jointly models spatiotemporal motion features under orthogonality constraints; (2) homography-based motion alignment refines cross-scene motion consistency. We introduce CameraScore, the first dedicated metric for evaluating camera motion fidelity. Quantitative experiments and user studies demonstrate significant improvements over baselines: CameraScore increases substantially, 90.45% of users prefer the generated motion fidelity, and 70.31% rate scene consistency as superior. Our approach achieves high-fidelity, generalizable camera motion transfer with no per-scene adaptation.

Technology Category

Application Category

📝 Abstract

We introduce CamMimic, an innovative algorithm tailored for dynamic video editing needs. It is designed to seamlessly transfer the camera motion observed in a given reference video onto any scene of the user's choice in a zero-shot manner without requiring any additional data. Our algorithm achieves this using a two-phase strategy by leveraging a text-to-video diffusion model. In the first phase, we develop a multi-concept learning method using a combination of LoRA layers and an orthogonality loss to capture and understand the underlying spatial-temporal characteristics of the reference video as well as the spatial features of the user's desired scene. The second phase proposes a unique homography-based refinement strategy to enhance the temporal and spatial alignment of the generated video. We demonstrate the efficacy of our method through experiments conducted on a dataset containing combinations of diverse scenes and reference videos containing a variety of camera motions. In the absence of an established metric for assessing camera motion transfer between unrelated scenes, we propose CameraScore, a novel metric that utilizes homography representations to measure camera motion similarity between the reference and generated videos. Extensive quantitative and qualitative evaluations demonstrate that our approach generates high-quality, motion-enhanced videos. Additionally, a user study reveals that 70.31% of participants preferred our method for scene preservation, while 90.45% favored it for motion transfer. We hope this work lays the foundation for future advancements in camera motion transfer across different scenes.

Problem

Research questions and friction points this paper is trying to address.

Transfer camera motion from reference video to any scene

Zero-shot method without needing additional data

Enhance video quality with spatial-temporal alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot camera motion transfer via diffusion models

Multi-concept learning with LoRA and orthogonality loss

Homography-based refinement for spatiotemporal alignment

🔎 Similar Papers

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities