MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing video generation methods suffer from inaccurate trajectory tracking, poor object consistency, and low visual fidelity in complex or multi-object motion control; moreover, they support only a single trajectory format and lack dedicated datasets and benchmarks. To address these limitations, we propose the first dense-to-sparse, three-level trajectory-controllable video generation framework, enabling motion guidance via masks, bounding boxes, and sparse keypoints. We introduce MagicData—the first specialized dataset for trajectory-controlled video generation—and MagicBench, a comprehensive evaluation benchmark. Additionally, we design a multi-granularity spatial conditioning injection mechanism and an automated trajectory annotation pipeline. Built upon diffusion models, our method achieves state-of-the-art performance across both trajectory accuracy and video quality metrics, significantly advancing the modeling of complex, multi-object motion.

Technology Category

Application Category

📝 Abstract

Recent advances in video generation have led to remarkable improvements in visual quality and temporal coherence. Upon this, trajectory-controllable video generation has emerged to enable precise object motion control through explicitly defined spatial paths. However, existing methods struggle with complex object movements and multi-object motion control, resulting in imprecise trajectory adherence, poor object consistency, and compromised visual quality. Furthermore, these methods only support trajectory control in a single format, limiting their applicability in diverse scenarios. Additionally, there is no publicly available dataset or benchmark specifically tailored for trajectory-controllable video generation, hindering robust training and systematic evaluation. To address these challenges, we introduce MagicMotion, a novel image-to-video generation framework that enables trajectory control through three levels of conditions from dense to sparse: masks, bounding boxes, and sparse boxes. Given an input image and trajectories, MagicMotion seamlessly animates objects along defined trajectories while maintaining object consistency and visual quality. Furthermore, we present MagicData, a large-scale trajectory-controlled video dataset, along with an automated pipeline for annotation and filtering. We also introduce MagicBench, a comprehensive benchmark that assesses both video quality and trajectory control accuracy across different numbers of objects. Extensive experiments demonstrate that MagicMotion outperforms previous methods across various metrics. Our project page are publicly available at https://quanhaol.github.io/magicmotion-site.

Problem

Research questions and friction points this paper is trying to address.

Enables precise object motion control in video generation

Addresses complex and multi-object motion control challenges

Introduces a dataset and benchmark for trajectory-controlled video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dense-to-sparse trajectory control levels

MagicData: large-scale trajectory-controlled dataset

MagicBench: comprehensive video quality benchmark

🔎 Similar Papers

Tora: Trajectory-oriented Diffusion Transformer for Video Generation