FlowLoss: Dynamic Flow-Conditioned Loss Strategy for Video Diffusion Models

📅 2025-04-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video diffusion models (VDMs) often suffer from motion inconsistency due to insufficient temporal modeling. To address this, we propose FlowLoss—a novel explicit optical flow matching loss that directly compares RAFT-estimated flow fields between generated and ground-truth videos, departing from conventional deformation-based implicit flow guidance. Furthermore, we introduce a noise-aware dynamic weighting mechanism that adaptively modulates the strength of optical flow supervision according to the noise level during the denoising process. Our method requires no auxiliary networks or post-processing modules. Evaluated on robotic video datasets, it significantly improves temporal motion consistency and accelerates training convergence. The core contribution lies in the first integration of explicit optical flow matching with noise-dependent dynamic weighting into a unified loss function—establishing a new paradigm for temporal modeling in VDMs.

Technology Category

Application Category

📝 Abstract
Video Diffusion Models (VDMs) can generate high-quality videos, but often struggle with producing temporally coherent motion. Optical flow supervision is a promising approach to address this, with prior works commonly employing warping-based strategies that avoid explicit flow matching. In this work, we explore an alternative formulation, FlowLoss, which directly compares flow fields extracted from generated and ground-truth videos. To account for the unreliability of flow estimation under high-noise conditions in diffusion, we propose a noise-aware weighting scheme that modulates the flow loss across denoising steps. Experiments on robotic video datasets suggest that FlowLoss improves motion stability and accelerates convergence in early training stages. Our findings offer practical insights for incorporating motion-based supervision into noise-conditioned generative models.
Problem

Research questions and friction points this paper is trying to address.

Improving temporal coherence in video diffusion models
Addressing unreliable flow estimation in noisy conditions
Enhancing motion stability and training convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directly compares flow fields from generated and real videos
Noise-aware weighting modulates flow loss across steps
Improves motion stability and early training convergence
🔎 Similar Papers
No similar papers found.