Sublinear Time Quantum Algorithm for Attention Approximation

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional attention mechanisms face scalability challenges due to their Ω(n²) time complexity for processing long sequences. This work proposes the first quantum algorithm capable of approximating any single row of the attention output in sublinear time, thereby surpassing the classical O(nd) lower bound. By integrating quantum Nyström approximation, quantum multivariate mean estimation, and quantum leverage score sampling, the method achieves a preprocessing cost of Õ(ε⁻¹n⁰·⁵(s_λ²·⁵ + s_λ¹·⁵d + α⁰·⁵d)), after which each query row can be computed in only Õ(s_λ² + s_λd) time. This represents a significant advance in enabling efficient large-scale sequence processing through quantum-enhanced computation.

Technology Category

Application Category

📝 Abstract
Given the query, key and value matrices $Q, K, V\in \mathbb{R}^{n\times d}$, the attention module is defined as $\mathrm{Att}(Q, K, V)=D^{-1}AV$ where $A=\exp(QK^\top/\sqrt{d})$ with $\exp(\cdot)$ applied entrywise, $D=\mathrm{diag}(A{\bf 1}_n)$. The attention module is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix $D^{-1}A$ incurs $\Omega(n^2)$ time, motivating numerous approximation schemes that reduce runtime to $\widetilde O(nd)$ via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of $\mathrm{Att}(Q, K, V)$ using only row queries to $Q, K, V$. Our algorithm preprocesses these matrices in $\widetilde{O}\left( \epsilon^{-1} n^{0.5} \left( s_\lambda^{2.5} + s_\lambda^{1.5} d + \alpha^{0.5} d \right) \right)$ time, where $\epsilon$ is the target accuracy, $s_\lambda$ is the $\lambda$-statistical dimension of the exponential kernel defined by $Q$ and $K$, and $\alpha$ measures the row distortion of $V$ that is at most $d/{\rm srank}(V)$, the stable rank of $V$. Each row query can be answered in $\widetilde{O}(s_\lambda^2 + s_\lambda d)$ time. To our knowledge, this is the first quantum data structure that approximates rows of the attention matrix in sublinear time with respect to $n$. Our approach relies on a quantum Nystr\"om approximation of the exponential kernel, quantum multivariate mean estimation for computing $D$, and quantum leverage score sampling for the multiplication with $V$.
Problem

Research questions and friction points this paper is trying to address.

attention approximation
sublinear time
quantum algorithm
softmax matrix
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum attention
sublinear time algorithm
quantum Nyström approximation
quantum leverage score sampling
exponential kernel
🔎 Similar Papers
No similar papers found.
Zhao Song
Zhao Song
University of California, Berkeley
Theory
J
Jianfei Xue
New York University
J
Jiahao Zhang
L
Lichen Zhang
Massachusetts Institute of Technology