Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address coarse-grained motion control, poor temporal coherence, and degraded visual quality in video generation, this paper proposes a plug-and-play fine-grained motion control framework that requires no architectural modification or model retraining. Our method introduces: (1) an optical-flow-guided real-time noise warping preprocessing step, enabling differentiable spatiotemporal deformation of latent noise; (2) a warped-noise sampling mechanism that preserves temporal consistency while maintaining spatial Gaussianity; and (3) unified support for localized object motion, global camera motion, and cross-video motion transfer. Applied as a lightweight fine-tuning module to state-of-the-art video diffusion models (e.g., SVD), our approach significantly improves the trade-off between motion controllability and frame fidelity across multiple benchmarks. User studies confirm realistic, natural zero-shot motion editing and demonstrate efficient adaptability with minimal fine-tuning overhead.

Technology Category

Application Category

📝 Abstract
Generative modeling aims to transform random noise into structured outputs. In this work, we enhance video diffusion models by allowing motion control via structured latent noise sampling. This is achieved by just a change in data: we pre-process training videos to yield structured noise. Consequently, our method is agnostic to diffusion model design, requiring no changes to model architectures or training pipelines. Specifically, we propose a novel noise warping algorithm, fast enough to run in real time, that replaces random temporal Gaussianity with correlated warped noise derived from optical flow fields, while preserving the spatial Gaussianity. The efficiency of our algorithm enables us to fine-tune modern video diffusion base models using warped noise with minimal overhead, and provide a one-stop solution for a wide range of user-friendly motion control: local object motion control, global camera movement control, and motion transfer. The harmonization between temporal coherence and spatial Gaussianity in our warped noise leads to effective motion control while maintaining per-frame pixel quality. Extensive experiments and user studies demonstrate the advantages of our method, making it a robust and scalable approach for controlling motion in video diffusion models. Video results are available on our webpage: https://vgenai-netflix-eyeline-research.github.io/Go-with-the-Flow/; source code and model checkpoints are available on GitHub: https://github.com/VGenAI-Netflix-Eyeline-Research/Go-with-the-Flow.
Problem

Research questions and friction points this paper is trying to address.

Action Control
Video Generation
Coherence Maintenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Noise Control
Video Action Optimization
Efficient Algorithm
🔎 Similar Papers
No similar papers found.
R
Ryan Burgert
Netflix Eyeline Studios, Stony Brook University
Y
Yuancheng Xu
Netflix Eyeline Studios, University of Maryland
Wenqi Xian
Wenqi Xian
Netflix Eyeline Studios
Computer VisionComputer Graphics
O
Oliver Pilarski
Netflix Eyeline Studios
P
Pascal Clausen
Netflix Eyeline Studios
Mingming He
Mingming He
Netflix
Computer VisionComputer Graphics
L
Li Ma
Netflix Eyeline Studios
Y
Yitong Deng
Netflix, Stanford University
L
Lingxiao Li
Netflix
M
Mohsen Mousavi
Netflix Eyeline Studios
M
Michael Ryoo
Stony Brook University
P
Paul E. Debevec
Netflix Eyeline Studios
N
Ning Yu
Netflix Eyeline Studios