Efficient Autoregressive Video Diffusion with Dummy Head

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency in autoregressive video diffusion models arising from underutilization of historical frames by multi-head self-attention mechanisms, which leads to computational redundancy and slow inference. The authors propose Dummy Forcing, a method that analyzes the dependency of each attention head on historical frames to optimize context access and caching strategies without requiring additional training. By integrating heterogeneous memory allocation, dynamic head programming, and context packing techniques, the approach effectively compresses the key-value (KV) cache and reduces inter-head redundancy. Evaluated on standard benchmarks, the method achieves up to a 2.0× speedup with less than 0.5% degradation in video generation quality, attaining an efficient inference rate of 24.3 frames per second.

Technology Category

Application Category

📝 Abstract
The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes historical frames: approximately 25% heads attend almost exclusively to the current frame, and discarding their KV caches incurs only minor performance degradation. Building upon this, we propose Dummy Forcing, a simple yet effective method to control context accessibility across different heads. Specifically, the proposed heterogeneous memory allocation reduces head-wise context redundancy, accompanied by dynamic head programming to adaptively classify head types. Moreover, we develop a context packing technique to achieve more aggressive cache compression. Without additional training, our Dummy Forcing delivers up to 2.0x speedup over the baseline, supporting video generation at 24.3 FPS with less than 0.5% quality drop. Project page is available at https://csguoh.github.io/project/DummyForcing/.
Problem

Research questions and friction points this paper is trying to address.

autoregressive video diffusion
multi-head self-attention
KV cache
context redundancy
video generation efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dummy Forcing
autoregressive video diffusion
multi-head self-attention
KV cache compression
context packing
🔎 Similar Papers