Motion aware video generative model

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video diffusion models rely solely on statistical learning and lack explicit modeling of physical motion laws, resulting in non-physical motion artifacts in generated videos. Method: This work systematically identifies distinctive spectral signatures of fundamental motions—including translation, rotation, and scaling—in the frequency domain for the first time. Leveraging these insights, we propose a physics-guided motion loss function and a zero-initialized frequency-domain enhancement module, both integrated into training as differentiable physical constraints. Our approach is fully compatible with mainstream video diffusion architectures and requires no modifications to the inference pipeline. Results: Extensive experiments demonstrate significant improvements in motion realism and physical plausibility, while preserving visual fidelity and semantic consistency. The method exhibits strong generalization across multiple benchmarks and incurs no additional inference overhead.

Technology Category

Application Category

📝 Abstract
Recent advances in diffusion-based video generation have yielded unprecedented quality in visual content and semantic coherence. However, current approaches predominantly rely on statistical learning from vast datasets without explicitly modeling the underlying physics of motion, resulting in subtle yet perceptible non-physical artifacts that diminish the realism of generated videos. This paper introduces a physics-informed frequency domain approach to enhance the physical plausibility of generated videos. We first conduct a systematic analysis of the frequency-domain characteristics of diverse physical motions (translation, rotation, scaling), revealing that each motion type exhibits distinctive and identifiable spectral signatures. Building on this theoretical foundation, we propose two complementary components: (1) a physical motion loss function that quantifies and optimizes the conformity of generated videos to ideal frequency-domain motion patterns, and (2) a frequency domain enhancement module that progressively learns to adjust video features to conform to physical motion constraints while preserving original network functionality through a zero-initialization strategy. Experiments across multiple video diffusion architectures demonstrate that our approach significantly enhances motion quality and physical plausibility without compromising visual quality or semantic alignment. Our frequency-domain physical motion framework generalizes effectively across different video generation architectures, offering a principled approach to incorporating physical constraints into deep learning-based video synthesis pipelines. This work seeks to establish connections between data-driven models and physics-based motion models.
Problem

Research questions and friction points this paper is trying to address.

Enhancing physical plausibility in generated videos
Addressing non-physical artifacts in video generation
Integrating physics-based motion models into deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physics-informed frequency domain approach
Physical motion loss function
Frequency domain enhancement module
🔎 Similar Papers
No similar papers found.
Bowen Xue
Bowen Xue
Undergraduate Student, University of Science and Technology of China
Video GenerationImage Generation
G
G. C. Guarnera
University of York, UK
S
Shuang Zhao
University of California, Irvine, UK
Z
Z. Montazeri
University of Manchester, UK