MotionPro: A Precise Motion Controller for Image-to-Video Generation

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing image-to-video (I2V) methods rely on large Gaussian kernels for motion modeling, lacking explicit representation of motion regions and types—resulting in coarse-grained motion control and inability to distinguish object motion from camera motion. To address this, we propose a fine-grained motion control framework comprising three key contributions: (1) the first region-level motion trajectory modeling approach coupled with dual-path disentanglement for object versus camera motion recognition; (2) MC-Bench—the first user-annotated, fine-grained motion control benchmark (1.1K samples); and (3) a diffusion model enhancement technique integrating optical flow estimation, trajectory sampling, motion mask generation, and feature modulation. Evaluated on WebVid-10M and MC-Bench, our method significantly improves motion accuracy and controllability, enabling region-level motion editing and motion semantic classification.

Technology Category

Application Category

📝 Abstract

Animating images with interactive motion control has garnered popularity for image-to-video (I2V) generation. Modern approaches typically rely on large Gaussian kernels to extend motion trajectories as condition without explicitly defining movement region, leading to coarse motion control and failing to disentangle object and camera moving. To alleviate these, we present MotionPro, a precise motion controller that novelly leverages region-wise trajectory and motion mask to regulate fine-grained motion synthesis and identify target motion category (i.e., object or camera moving), respectively. Technically, MotionPro first estimates the flow maps on each training video via a tracking model, and then samples the region-wise trajectories to simulate inference scenario. Instead of extending flow through large Gaussian kernels, our region-wise trajectory approach enables more precise control by directly utilizing trajectories within local regions, thereby effectively characterizing fine-grained movements. A motion mask is simultaneously derived from the predicted flow maps to capture the holistic motion dynamics of the movement regions. To pursue natural motion control, MotionPro further strengthens video denoising by incorporating both region-wise trajectories and motion mask through feature modulation. More remarkably, we meticulously construct a benchmark, i.e., MC-Bench, with 1.1K user-annotated image-trajectory pairs, for the evaluation of both fine-grained and object-level I2V motion control. Extensive experiments conducted on WebVid-10M and MC-Bench demonstrate the effectiveness of MotionPro. Please refer to our project page for more results: https://zhw-zhang.github.io/MotionPro-page/.

Problem

Research questions and friction points this paper is trying to address.

Enables precise motion control in image-to-video generation

Disentangles object and camera movement for accurate synthesis

Introduces region-wise trajectories and motion masks for fine-grained control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Region-wise trajectory for fine-grained motion control

Motion mask to identify target motion category

Feature modulation with trajectories and motion mask

🔎 Similar Papers

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion