Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the computational burden of self-attention in video diffusion Transformers, which hinders high-fidelity video generation, and the temporal flickering artifacts caused by existing sparse attention methods due to their static patterns and deterministic routing. To overcome these limitations, the authors propose PASA (Precision-Allocation Sparse Attention), a training-free framework that dynamically allocates computational budgets based on curvature-aware importance, employs hardware-aligned local refinement with grouped approximation, and incorporates stochastic bias into attention routing. This approach significantly accelerates inference while effectively suppressing visual flicker. Experiments demonstrate that PASA enables state-of-the-art video diffusion models to generate high-quality videos with structurally stable content and temporally coherent dynamics, achieving an optimal balance between computational efficiency and visual consistency.

Technology Category

Application Category

📝 Abstract

Video Diffusion Transformers have revolutionized high-fidelity video generation but suffer from the massive computational burden of self-attention. While sparse attention provides a promising acceleration solution, existing methods frequently provoke severe visual flickering caused by static sparsity patterns and deterministic block routing. To resolve these limitations, we propose Precision-Allocated Sparse Attention (PASA), a training-free framework designed for highly efficient and temporally smooth video generation. First, we implement a curvature-aware dynamic budgeting mechanism. By profiling the generation trajectory acceleration across timesteps, we elastically allocate the exact-computation budget to secure high-precision processing strictly during critical semantic transitions. Second, we replace global homogenizing estimations with hardware-aligned grouped approximations, successfully capturing fine-grained local variations while maintaining peak compute throughput. Finally, we incorporate a stochastic selection bias into the attention routing mechanism. This probabilistic approach softens rigid selection boundaries and eliminates selection oscillation, effectively eradicating the localized computational starvation that drives temporal flickering. Extensive evaluations on leading video diffusion models demonstrate that PASA achieves substantial inference acceleration while consistently producing remarkably fluid and structurally stable video sequences.

Problem

Research questions and friction points this paper is trying to address.

video generation

sparse attention

visual flickering

temporal smoothness

computational burden

Innovation

Methods, ideas, or system contributions that make the work stand out.

Precision-Allocated Sparse Attention

dynamic budgeting

stochastic attention routing