Efficient Video Diffusion Models: Advancements and Challenges

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Video diffusion models face significant deployment challenges in high-fidelity video generation due to high inference costs, complex spatiotemporal computation, and substantial memory overhead. This work presents the first systematic survey of efficient video diffusion models, introducing a unified taxonomy that encompasses four mainstream optimization paradigms: step distillation, efficient attention mechanisms, model compression, and caching with trajectory optimization—each aimed at reducing the number of function evaluations and lowering per-step computational cost. The study clarifies the evolutionary trajectory of algorithms and their core optimization objectives, while highlighting promising future directions such as quality preservation, hardware-aware co-design, and long-duration video generation. By offering a structured reference, this survey aims to facilitate standardized evaluation and practical deployment of efficient video generation technologies.

Technology Category

Application Category

📝 Abstract

Video diffusion models have rapidly become the dominant paradigm for high-fidelity generative video synthesis, but their practical deployment remains constrained by severe inference costs. Compared with image generation, video synthesis compounds computation across spatial-temporal token growth and iterative denoising, making attention and memory traffic major bottlenecks in real-world settings. This survey provides a systematic and deployment-oriented review of efficient video diffusion models. We propose a unified categorization that organizes existing methods into four classes of main paradigms, including step distillation, efficient attention, model compression, and cache/trajectory optimization. Building on this categorization, we respectively analyze algorithmic trends of these four paradigms and examine how different design choices target two core objectives: reducing the number of function evaluations and minimizing per-step overhead. Finally, we discuss open challenges and future directions, including quality preservation under composite acceleration, hardware-software co-design, robust real-time long-horizon generation, and open infrastructure for standardized evaluation. To the best of our knowledge, our work is the first comprehensive survey on efficient video diffusion models, offering researchers and engineers a structured overview of the field and its emerging research directions.

Problem

Research questions and friction points this paper is trying to address.

video diffusion models

inference cost

computational bottleneck

memory traffic

efficient generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

efficient video diffusion

step distillation

efficient attention