Nearly Optimal Attention Coresets

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This work addresses the problem of efficiently approximating the output of attention mechanisms within a bounded query space while guaranteeing uniform approximation accuracy for all queries with bounded norm. Leveraging tools from high-dimensional geometry and probabilistic analysis, the authors construct sparse subsets of key-value pairs—termed ε-coresets—that satisfy a uniform error bound. The main contribution lies in establishing the existence of an ε-coreset of size O(√d·e^{ρ+o(ρ)}/ε) and proving a matching lower bound of Ω(√d·e^ρ/ε), thereby significantly improving upon existing results and nearly tightly characterizing the optimal size of attention coresets.

📝 Abstract

We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{ρ+o(ρ)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $Ω({\sqrt{d} e^ρ/ε})$.

Problem

Research questions and friction points this paper is trying to address.

Attention mechanism

coresets

space efficiency

approximation error

query-bounded norm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention mechanism

coresets

space-efficient approximation