Dynamic Video Generation: Shaping Video Generation Across Time and Space

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Video diffusion models incur substantial computational costs due to the large number of spatiotemporal tokens they process, and existing acceleration methods often struggle to balance efficiency with generation quality. This work proposes Dynamic Video Generation (DVG), a novel framework that, for the first time, enables joint dynamic computation allocation across both spatial and temporal dimensions. DVG employs a content-aware mechanism to automatically adjust resolution and frame rate without requiring manual hyperparameter tuning or model retraining. The framework is compatible with complementary acceleration techniques such as knowledge distillation and supports near-lossless speedup across diverse models and tasks. Evaluated on the HunyuanVideo model family, DVG achieves up to 7× acceleration alone and up to 18× when combined with distillation, while preserving high visual fidelity.

📝 Abstract

Diffusion models have achieved impressive performance in video generation, but their iterative denoising process remains computationally expensive due to the large number of tokens processed at each timestep. Recently, progressive resolution sampling has emerged as a promising acceleration approach by reducing latent resolution in early stages. However, scaling this idea to video generation remains challenging, as the additional temporal dimension introduces diverse spatio-temporal demands across different videos, and compressing only a single dimension often leads to limited acceleration or degraded quality. Therefore, we propose DVG, a Dynamic Video Generation framework that jointly allocates computation across time and space, automatically selecting content-aware acceleration strategies without manual tuning or retraining. DVG achieves near-lossless acceleration across models and tasks, reaching up to 7 times speedup on HunyuanVideo and HunyuanVideo-1.5, and 18 times when combined with distillation, demonstrating its potential as a key component in today's large-scale efficient video generation systems. Our code is in supplementary material and will be released on Github.

Problem

Research questions and friction points this paper is trying to address.

video generation

diffusion models

computational efficiency

spatio-temporal compression

acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Video Generation

Spatio-temporal Acceleration

Diffusion Models