Mojito: Motion Trajectory and Intensity Control for Video Generation

📅 2024-12-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address weak motion controllability in text-to-video generation, this paper proposes a training-free directional motion control framework. Methodologically, it introduces a plug-and-play Directional Motion Control module for precise trajectory guidance and, for the first time in diffusion-based video generation, incorporates a RAFT-based Motion Intensity Modulator to jointly decouple and regulate motion direction and intensity. The approach integrates diffusion modeling, cross-attention mechanisms, text-video conditional generation, and explicit motion feature injection. Experiments demonstrate state-of-the-art motion controllability across multiple benchmarks: trajectory alignment error is reduced by 37%, motion intensity supports five-level fine-grained adjustment, and inference speed is accelerated by 8.2× compared to fine-tuning-based alternatives.

Technology Category

Application Category

📝 Abstract

Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training video diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. To tackle these challenges, this paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation. Specifically, Mojito features a Directional Motion Control (DMC) module that leverages cross-attention to efficiently direct the generated object's motion without training, alongside a Motion Intensity Modulator (MIM) that uses optical flow maps generated from videos to guide varying levels of motion intensity. Extensive experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high computational efficiency, generating motion patterns that closely match specified directions and intensities, providing realistic dynamics that align well with natural motion in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Enhance video diffusion model training

Integrate directional motion guidance

Control motion intensity for realism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Directional Motion Control module

Motion Intensity Modulator

Optical flow maps guidance

🔎 Similar Papers

Tora: Trajectory-oriented Diffusion Transformer for Video Generation