Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the efficient computation of Gaussian kernel matrix–vector multiplication (for asymmetric kernel matrices). We propose a subquadratic-time algorithm that, under a sparsity assumption—namely, that row and column sums of the kernel matrix grow linearly with input size—achieves time complexity $O(n^{2-alpha} d)$ ($alpha > 0$), linear space complexity, and provable $L_2$-norm error guarantees for arbitrary query-key vector pairs. This is the first such result for general Gaussian kernels, breaking the standard $O(n^2 d)$ barrier inherent in naive attention computation. The method directly accelerates the core attention mechanism in large language models (LLMs). Empirical evaluation validates the sparsity assumption on real LLM attention matrices and demonstrates superior trade-offs between accuracy and efficiency compared to baseline approaches.

Technology Category

Application Category

📝 Abstract
Motivated by the problem of fast processing of attention matrices, we study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $Kin mathbb{R}^{n imes n}$. $K$'s columns are indexed by a set of $n$ keys $k_1,k_2ldots, k_nin mathbb{R}^d$, rows by a set of $n$ queries $q_1,q_2,ldots,q_nin mathbb{R}^d $, and its $i,j$ entry is $K_{ij} = e^{-|q_i-k_j|_2^2/2σ^2}$ for some bandwidth parameter $σ>0$. Given a vector $xin mathbb{R}^n$ and error parameter $ε>0$, our task is to output a $yin mathbb{R}^n$ such that $|Kx-y|_2leq ε|x|_2$ in time subquadratic in $n$ and linear in $d$. Our algorithms rely on the following modelling assumption about the matrices $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to worst case quadratic growth. We validate this assumption experimentally, for Gaussian kernel matrices encountered in various settings such as fast attention computation in LLMs. We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted vectors.
Problem

Research questions and friction points this paper is trying to address.

Develop fast algorithms for kernel matrix-vector multiplication
Achieve subquadratic time complexity under sparsity assumptions
Focus on asymmetric Gaussian kernel matrices in attention computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast matrix-vector multiplication for Gaussian kernels
Subquadratic-time algorithm under sparsity assumptions
Validated for attention computation in LLMs
🔎 Similar Papers
No similar papers found.