Fast attention mechanisms: a tale of parallelism

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

The quadratic time complexity of standard Transformers severely limits their scalability in massively parallel computation (MPC) modeling and deep reasoning tasks. Method: We propose ANNA—a subquadratic approximate attention mechanism that preserves the expressive power of standard attention while explicitly supporting simulation of MPC algorithms. Contribution/Results: We theoretically prove that ANNA-Transformers exactly capture the computational power of MPC, unifying constant-depth low-rank Transformers for the first time. This establishes the first MPC-based framework for analyzing both expressivity and depth-efficiency of attention mechanisms. Empirically, ANNA achieves near-optimal reasoning depth on benchmarks such as Match2 and k-hop reasoning, significantly improving computational efficiency and model scalability without sacrificing accuracy.

Technology Category

Application Category

📝 Abstract

Transformers have the representational capacity to simulate Massively Parallel Computation (MPC) algorithms, but they suffer from quadratic time complexity, which severely limits their scalability. We introduce an efficient attention mechanism called Approximate Nearest Neighbor Attention (ANNA) with sub-quadratic time complexity. We prove that ANNA-transformers (1) retain the expressive power previously established for standard attention in terms of matching the capabilities of MPC algorithms, and (2) can solve key reasoning tasks such as Match2 and $k$-hop with near-optimal depth. Using the MPC framework, we further prove that constant-depth ANNA-transformers can simulate constant-depth low-rank transformers, thereby providing a unified way to reason about a broad class of efficient attention approximations.

Problem

Research questions and friction points this paper is trying to address.

Reducing quadratic complexity in transformer attention mechanisms

Maintaining expressive power with efficient attention approximation

Enabling scalable simulation of massively parallel computation algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximate Nearest Neighbor Attention mechanism

Sub-quadratic time complexity design

Simulates constant-depth low-rank transformers

🔎 Similar Papers

No similar papers found.