ATI: Any Trajectory Instruction for Controllable Video Generation

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing video generation methods rely on separate modules to model camera motion, object translation, and local deformation—limiting holistic motion control. To address this, we propose the first unified trajectory-driven motion control framework. Our approach represents multi-granularity motion—including global viewpoint changes and fine-grained local deformations—as editable keypoint trajectories, which are injected into the latent space of pre-trained image-to-video diffusion models (e.g., SVD, PixArt-Video) via a lightweight motion injector. This enables end-to-end, plug-and-play motion conditioning without modifying the backbone architecture. Crucially, our method preserves temporal coherence and semantic alignment while simultaneously controlling both macro-scale camera motions and micro-scale deformations. Extensive experiments demonstrate state-of-the-art performance across motion sketching, dynamic viewpoint synthesis, and precise motion editing—achieving superior controllability, visual fidelity, and cross-model generalization.

Technology Category

Application Category

📝 Abstract

We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion using trajectory-based inputs. In contrast to prior methods that address these motion types through separate modules or task-specific designs, our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models via a lightweight motion injector. Users can specify keypoints and their motion paths to control localized deformations, entire object motion, virtual camera dynamics, or combinations of these. The injected trajectory signals guide the generative process to produce temporally consistent and semantically aligned motion sequences. Our framework demonstrates superior performance across multiple video motion control tasks, including stylized motion effects (e.g., motion brushes), dynamic viewpoint changes, and precise local motion manipulation. Experiments show that our method provides significantly better controllability and visual quality compared to prior approaches and commercial solutions, while remaining broadly compatible with various state-of-the-art video generation backbones. Project page: https://anytraj.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Unified motion control in video generation using trajectory inputs

Integrates camera movement, object translation, and local motion

Projects user-defined trajectories into latent space for control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified trajectory-based motion control framework

Lightweight motion injector in latent space

Keypoint-guided localized and object motion

🔎 Similar Papers

Tora: Trajectory-oriented Diffusion Transformer for Video Generation