Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image animation methods lack flexibility in video-level camera and object motion control, hindering fine-grained co-regulation. To address this, we propose the first 3D-aware motion representation enabling adaptive-granularity joint modeling of camera and object motion. We introduce a novel “perception-as-control” paradigm, wherein user-intent-driven 3D motion is directly mapped to multi-view-perceivable visual changes, with perception-aware rendering outputs serving as a unified control signal. Our method constructs an implicit 3D motion field from a single input image, integrating intent parsing, multi-view rendering, and diffusion-based video synthesis modules. Extensive experiments on multiple benchmarks demonstrate significant improvements in motion controllability and cross-view consistency. Both qualitative and quantitative evaluations surpass state-of-the-art methods, enabling robust generation of complex interactive animations.

Technology Category

Application Category

📝 Abstract
Motion-controllable image animation is a fundamental task with a wide range of potential applications. Recent works have made progress in controlling camera or object motion via various motion representations, while they still struggle to support collaborative camera and object motion control with adaptive control granularity. To this end, we introduce 3D-aware motion representation and propose an image animation framework, called Perception-as-Control, to achieve fine-grained collaborative motion control. Specifically, we construct 3D-aware motion representation from a reference image, manipulate it based on interpreted user intentions, and perceive it from different viewpoints. In this way, camera and object motions are transformed into intuitive, consistent visual changes. Then, the proposed framework leverages the perception results as motion control signals, enabling it to support various motion-related video synthesis tasks in a unified and flexible way. Experiments demonstrate the superiority of the proposed framework. For more details and qualitative results, please refer to our project webpage: https://chen-yingjie.github.io/projects/Perception-as-Control.
Problem

Research questions and friction points this paper is trying to address.

Image Animation Control
Camera Movement
Object Manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Technology
Video Production
Perception-to-Control Framework
🔎 Similar Papers
No similar papers found.