$ abla$NABLA: Neighborhood Adaptive Block-Level Attention

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

In video generation, the quadratic computational complexity of Transformer full attention severely hinders high-resolution and long-sequence modeling. To address this, we propose a neighborhood-adaptive block-wise attention mechanism: it dynamically adjusts the attention span under sparse patterns via learnable thresholds, preserving both local and global contextual modeling while substantially reducing computation. The method requires no custom kernels and natively supports PyTorch Flex Attention, ensuring seamless deployment. Integrated into the diffusion Transformer (DiT) framework, it enables end-to-end acceleration during both training and inference. Experiments show a 2.7× speedup over baseline full attention, with no statistically significant degradation in CLIP score, VBench metrics, or human perceptual evaluation—maintaining consistent visual fidelity. Our core contribution is a dynamic sparse attention design that jointly achieves computational efficiency, architectural generality, and high-fidelity generation.

Technology Category

Application Category

📝 Abstract

Recent progress in transformer-based architectures has demonstrated remarkable success in video generation tasks. However, the quadratic complexity of full attention mechanisms remains a critical bottleneck, particularly for high-resolution and long-duration video sequences. In this paper, we propose NABLA, a novel Neighborhood Adaptive Block-Level Attention mechanism that dynamically adapts to sparsity patterns in video diffusion transformers (DiTs). By leveraging block-wise attention with adaptive sparsity-driven threshold, NABLA reduces computational overhead while preserving generative quality. Our method does not require custom low-level operator design and can be seamlessly integrated with PyTorch's Flex Attention operator. Experiments demonstrate that NABLA achieves up to 2.7x faster training and inference compared to baseline almost without compromising quantitative metrics (CLIP score, VBench score, human evaluation score) and visual quality drop. The code and model weights are available here: https://github.com/gen-ai-team/Wan2.1-NABLA

Problem

Research questions and friction points this paper is trying to address.

Reduces quadratic complexity in video diffusion transformers

Adapts block-level attention to sparsity patterns dynamically

Maintains generative quality while lowering computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neighborhood Adaptive Block-Level Attention mechanism

Block-wise attention with adaptive sparsity threshold

Seamless integration with PyTorch's Flex Attention

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs