Dynamic Video Generation: Shaping Video Generation Across Time and Space

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
Video diffusion models incur substantial computational costs due to the large number of spatiotemporal tokens they process, and existing acceleration methods often struggle to balance efficiency with generation quality. This work proposes Dynamic Video Generation (DVG), a novel framework that, for the first time, enables joint dynamic computation allocation across both spatial and temporal dimensions. DVG employs a content-aware mechanism to automatically adjust resolution and frame rate without requiring manual hyperparameter tuning or model retraining. The framework is compatible with complementary acceleration techniques such as knowledge distillation and supports near-lossless speedup across diverse models and tasks. Evaluated on the HunyuanVideo model family, DVG achieves up to 7× acceleration alone and up to 18× when combined with distillation, while preserving high visual fidelity.
📝 Abstract
Diffusion models have achieved impressive performance in video generation, but their iterative denoising process remains computationally expensive due to the large number of tokens processed at each timestep. Recently, progressive resolution sampling has emerged as a promising acceleration approach by reducing latent resolution in early stages. However, scaling this idea to video generation remains challenging, as the additional temporal dimension introduces diverse spatio-temporal demands across different videos, and compressing only a single dimension often leads to limited acceleration or degraded quality. Therefore, we propose DVG, a Dynamic Video Generation framework that jointly allocates computation across time and space, automatically selecting content-aware acceleration strategies without manual tuning or retraining. DVG achieves near-lossless acceleration across models and tasks, reaching up to 7 times speedup on HunyuanVideo and HunyuanVideo-1.5, and 18 times when combined with distillation, demonstrating its potential as a key component in today's large-scale efficient video generation systems. Our code is in supplementary material and will be released on Github.
Problem

Research questions and friction points this paper is trying to address.

video generation
diffusion models
computational efficiency
spatio-temporal compression
acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Video Generation
Spatio-temporal Acceleration
Diffusion Models
Progressive Resolution Sampling
Content-aware Computation Allocation
🔎 Similar Papers
No similar papers found.