OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work proposes an efficient text-to-video generation framework addressing the quadratic computational complexity of full attention mechanisms in diffusion Transformers. The approach introduces several key innovations: Skiparse-2D sparse attention combined with Sparse Sequence Parallelism (SSP), reducing communication overhead by 75%; a hybrid full-sparse attention architecture compatible with FlashAttention; HiF8 8-bit quantization; and Mix-GRPO, a post-training reinforcement learning strategy. Evaluated on VBench, the method achieves a total score of 83.73%, delivers 1.64× faster single-GPU inference, and accelerates eight-GPU training by over 1.52×. On Ascend 950PR hardware, it attains up to 2.27× speedup with only a 0.4% performance drop, significantly enhancing cross-platform training and inference efficiency.

📝 Abstract

Diffusion Transformers achieve strong video generation quality, but the quadratic cost of full attention limits efficiency. We introduce OSP-Next, an efficient text-to-video generation model that integrates sparse attention, parallelism, quantization, and reinforcement learning. OSP-Next uses a hybrid full-sparse attention architecture, where the sparse component is implemented with Skiparse-2D Attention. This fixed-pattern mechanism applies token-wise and group-wise sparse attention along spatial dimensions, leveraging locality while maintaining native compatibility with FlashAttention kernels. Based on the local equivalence of rearrangement in Skiparse-2D Attention, we further propose Sparse Sequence Parallelism (SSP), which partitions subsequences across ranks and switches sparse patterns through a single All-to-All communication. Compared with Ulysses Sequence Parallelism (SP), SSP provides a native parallel strategy for sparse attention and reduces communication volume by 75%. OSP-Next also incorporates HiF8 quantization to enable stable joint training with 8-bit quantization and sparse fine-tuning, and applies Mix-GRPO post-training to improve the performance of the sparse model. Experiments show that OSP-Next achieves a VBench total score of 83.73%, surpassing the Wan2.1 baseline. Under the 5-second 720P and 5-second 768P settings, OSP-Next achieves up to 1.64$\times$ single-GPU speedup and over 1.52$\times$ eight-GPU speedup on NVIDIA H200 GPUs. In addition, with only a 0.4% drop in VBench total score, OSP-Next-HiF8 achieves 1.69$\times$ and 2.27$\times$ speedups under the two settings on a single Ascend 950PR, demonstrating the efficiency and performance of OSP-Next across hardware platforms.

Problem

Research questions and friction points this paper is trying to address.

video generation

diffusion transformers

attention efficiency

computational cost

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Sequence Parallelism

Skiparse-2D Attention

HiF8 Quantization