Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

243K/year
🤖 AI Summary
This work addresses the computational burden of self-attention in video diffusion Transformers, which hinders high-fidelity video generation, and the temporal flickering artifacts caused by existing sparse attention methods due to their static patterns and deterministic routing. To overcome these limitations, the authors propose PASA (Precision-Allocation Sparse Attention), a training-free framework that dynamically allocates computational budgets based on curvature-aware importance, employs hardware-aligned local refinement with grouped approximation, and incorporates stochastic bias into attention routing. This approach significantly accelerates inference while effectively suppressing visual flicker. Experiments demonstrate that PASA enables state-of-the-art video diffusion models to generate high-quality videos with structurally stable content and temporally coherent dynamics, achieving an optimal balance between computational efficiency and visual consistency.

Technology Category

Application Category

📝 Abstract
Video Diffusion Transformers have revolutionized high-fidelity video generation but suffer from the massive computational burden of self-attention. While sparse attention provides a promising acceleration solution, existing methods frequently provoke severe visual flickering caused by static sparsity patterns and deterministic block routing. To resolve these limitations, we propose Precision-Allocated Sparse Attention (PASA), a training-free framework designed for highly efficient and temporally smooth video generation. First, we implement a curvature-aware dynamic budgeting mechanism. By profiling the generation trajectory acceleration across timesteps, we elastically allocate the exact-computation budget to secure high-precision processing strictly during critical semantic transitions. Second, we replace global homogenizing estimations with hardware-aligned grouped approximations, successfully capturing fine-grained local variations while maintaining peak compute throughput. Finally, we incorporate a stochastic selection bias into the attention routing mechanism. This probabilistic approach softens rigid selection boundaries and eliminates selection oscillation, effectively eradicating the localized computational starvation that drives temporal flickering. Extensive evaluations on leading video diffusion models demonstrate that PASA achieves substantial inference acceleration while consistently producing remarkably fluid and structurally stable video sequences.
Problem

Research questions and friction points this paper is trying to address.

video generation
sparse attention
visual flickering
temporal smoothness
computational burden
Innovation

Methods, ideas, or system contributions that make the work stand out.

Precision-Allocated Sparse Attention
dynamic budgeting
stochastic attention routing
temporal smoothness
video diffusion transformers