FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses imprecise motion modeling and unnatural object motion in camera-controllable video generation. We propose FloVD, a flow-guided video diffusion model. Methodologically, we pioneer the explicit integration of optical flow as a motion prior into the video diffusion framework: (1) a first stage generates structurally coherent optical flow fields; (2) a second stage synthesizes videos conditioned on these flows. Background flow implicitly encodes 3D viewpoint correlations, enabling arbitrary 6-DoF camera control without requiring ground-truth camera parameters or paired annotations. Experiments demonstrate that FloVD significantly outperforms state-of-the-art methods in camera trajectory tracking error and motion naturalness metrics. Crucially, it maintains object motion consistency and fine-grained camera controllability even under unsupervised settings—without per-scene supervision or explicit 3D priors.

Technology Category

Application Category

📝 Abstract

This paper presents FloVD, a novel optical-flow-based video diffusion model for camera-controllable video generation. FloVD leverages optical flow maps to represent motions of the camera and moving objects. This approach offers two key benefits. Since optical flow can be directly estimated from videos, our approach allows for the use of arbitrary training videos without ground-truth camera parameters. Moreover, as background optical flow encodes 3D correlation across different viewpoints, our method enables detailed camera control by leveraging the background motion. To synthesize natural object motion while supporting detailed camera control, our framework adopts a two-stage video synthesis pipeline consisting of optical flow generation and flow-conditioned video synthesis. Extensive experiments demonstrate the superiority of our method over previous approaches in terms of accurate camera control and natural object motion synthesis.

Problem

Research questions and friction points this paper is trying to address.

Camera-controllable video generation

Optical flow representation

Natural object motion synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optical flow maps enhance motion representation

Two-stage pipeline for video synthesis

Background flow enables detailed camera control

🔎 Similar Papers

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control