Sublinear Time Quantum Algorithm for Attention Approximation

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Traditional attention mechanisms face scalability challenges due to their Ω(n²) time complexity for processing long sequences. This work proposes the first quantum algorithm capable of approximating any single row of the attention output in sublinear time, thereby surpassing the classical O(nd) lower bound. By integrating quantum Nyström approximation, quantum multivariate mean estimation, and quantum leverage score sampling, the method achieves a preprocessing cost of Õ(ε⁻¹n⁰·⁵(s_λ²·⁵ + s_λ¹·⁵d + α⁰·⁵d)), after which each query row can be computed in only Õ(s_λ² + s_λd) time. This represents a significant advance in enabling efficient large-scale sequence processing through quantum-enhanced computation.

Technology Category

Application Category

📝 Abstract

Given the query, key and value matrices $Q, K, V\in \mathbb{R}^{n\times d}$, the attention module is defined as $\mathrm{Att}(Q, K, V)=D^{-1}AV$ where $A=\exp(QK^\top/\sqrt{d})$ with $\exp(\cdot)$ applied entrywise, $D=\mathrm{diag}(A{\bf 1}_n)$. The attention module is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix $D^{-1}A$ incurs $\Omega(n^2)$ time, motivating numerous approximation schemes that reduce runtime to $\widetilde O(nd)$ via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of $\mathrm{Att}(Q, K, V)$ using only row queries to $Q, K, V$. Our algorithm preprocesses these matrices in $\widetilde{O}\left( \epsilon^{-1} n^{0.5} \left( s_\lambda^{2.5} + s_\lambda^{1.5} d + \alpha^{0.5} d \right) \right)$ time, where $\epsilon$ is the target accuracy, $s_\lambda$ is the $\lambda$-statistical dimension of the exponential kernel defined by $Q$ and $K$, and $\alpha$ measures the row distortion of $V$ that is at most $d/{\rm srank}(V)$, the stable rank of $V$. Each row query can be answered in $\widetilde{O}(s_\lambda^2 + s_\lambda d)$ time. To our knowledge, this is the first quantum data structure that approximates rows of the attention matrix in sublinear time with respect to $n$. Our approach relies on a quantum Nystr\"om approximation of the exponential kernel, quantum multivariate mean estimation for computing $D$, and quantum leverage score sampling for the multiplication with $V$.

Problem

Research questions and friction points this paper is trying to address.

attention approximation

sublinear time

quantum algorithm

softmax matrix

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum attention

sublinear time algorithm

quantum Nyström approximation