Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

📅 2025-01-14

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address coarse-grained motion control, poor temporal coherence, and degraded visual quality in video generation, this paper proposes a plug-and-play fine-grained motion control framework that requires no architectural modification or model retraining. Our method introduces: (1) an optical-flow-guided real-time noise warping preprocessing step, enabling differentiable spatiotemporal deformation of latent noise; (2) a warped-noise sampling mechanism that preserves temporal consistency while maintaining spatial Gaussianity; and (3) unified support for localized object motion, global camera motion, and cross-video motion transfer. Applied as a lightweight fine-tuning module to state-of-the-art video diffusion models (e.g., SVD), our approach significantly improves the trade-off between motion controllability and frame fidelity across multiple benchmarks. User studies confirm realistic, natural zero-shot motion editing and demonstrate efficient adaptability with minimal fine-tuning overhead.

Technology Category

Application Category

📝 Abstract

Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow/; source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.

Problem

Research questions and friction points this paper is trying to address.

Action Control

Video Generation

Coherence Maintenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Noise Control

Video Action Optimization

Efficient Algorithm

🔎 Similar Papers

No similar papers found.