Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work proposes an efficient implementation of the attention mechanism with Attention with Linear Biases (ALiBi) by leveraging Locality-Sensitive Hashing (LSH). It establishes, for the first time, a unified theoretical framework connecting positional biases, attention masks, and positional encodings by interpreting the ALiBi bias matrix as the expectation of a continuous block-diagonal binary mask. The method approximates ALiBi through randomly sampled masks, decomposing long-context attention into multiple short-context, unbiased attention operations. This decomposition achieves near-linear time complexity while provably preserving approximation accuracy with high probability. Experimental results validate the theoretical analysis and demonstrate significant improvements in computational efficiency for processing long sequences.

📝 Abstract

Positional encoding in transformers is commonly implemented through positional embeddings, attention masks, or bias terms, but formal connections between these mechanisms remain limited. We study attention with positional bias through the lens of locality-sensitive hashing (LSH), focusing on Attention with Linear Biases (ALiBi). We show that the ALiBi bias matrix is the expectation of contiguous block-diagonal binary masks induced by a ``positional LSH'' scheme. The empirical mean of masks sampled from this scheme yields spectral norm and max-norm approximation guarantees with bounded block sizes with high probability. This structural theorem implies a uniform approximation theorem for ALiBi-biased attention: with high probability over the sampled masks, the approximate attention output is accurate simultaneously for all query-key-value inputs and can be computed in near-linear time in the context length, reducing long-context ALiBi to a collection of randomized short-context regular (positionally unbiased) attention operations. Conceptually, this connects positional bias, masks, and positional embeddings in a single formal framework and suggests an approach to efficient ALiBi-biased attention. Experiments on large language models validate our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

positional bias

Attention with Linear Biases

locality-sensitive hashing

long-context attention

transformer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Positional LSH

ALiBi

binary block matrix approximation