TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the fundamental challenge in high-dimensional vector quantization: the difficulty of simultaneously minimizing geometric distortions—particularly mean squared error (MSE) and inner-product distortion—while approaching information-theoretic lower bounds. We propose a data-agnostic, general-purpose quantization framework. Methodologically, we introduce the novel paradigm of “random orthogonal rotation followed by coordinate-wise scalar quantization,” design a two-stage unbiased inner-product quantization mechanism, and—crucially—derive for the first time the information-theoretic lower bound on inner-product distortion. We theoretically prove that our method’s distortion rate exceeds this bound by only a factor of approximately 2.7. Experiments demonstrate zero-quality-loss KV cache compression at 3.5 bits per component (bpc) and near-lossless performance at 2.5 bpc; in approximate nearest neighbor search, it surpasses product quantization in recall while reducing index overhead to nearly zero.

Technology Category

Application Category

📝 Abstract
Vector quantization, a problem rooted in Shannon's source coding theory, aims to quantize high-dimensional Euclidean vectors while minimizing distortion in their geometric structure. We propose TurboQuant to address both mean-squared error (MSE) and inner product distortion, overcoming limitations of existing methods that fail to achieve optimal distortion rates. Our data-oblivious algorithms, suitable for online applications, achieve near-optimal distortion rates (within a small constant factor) across all bit-widths and dimensions. TurboQuant achieves this by randomly rotating input vectors, inducing a concentrated Beta distribution on coordinates, and leveraging the near-independence property of distinct coordinates in high dimensions to simply apply optimal scalar quantizers per each coordinate. Recognizing that MSE-optimal quantizers introduce bias in inner product estimation, we propose a two-stage approach: applying an MSE quantizer followed by a 1-bit Quantized JL (QJL) transform on the residual, resulting in an unbiased inner product quantizer. We also provide a formal proof of the information-theoretic lower bounds on best achievable distortion rate by any vector quantizer, demonstrating that TurboQuant closely matches these bounds, differing only by a small constant ($approx 2.7$) factor. Experimental results validate our theoretical findings, showing that for KV cache quantization, we achieve absolute quality neutrality with 3.5 bits per channel and marginal quality degradation with 2.5 bits per channel. Furthermore, in nearest neighbor search tasks, our method outperforms existing product quantization techniques in recall while reducing indexing time to virtually zero.
Problem

Research questions and friction points this paper is trying to address.

Minimizes distortion in vector quantization for high-dimensional Euclidean vectors
Achieves near-optimal distortion rates across all bit-widths and dimensions
Addresses bias in inner product estimation with a two-stage approach
Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomly rotates vectors for optimal quantization
Two-stage MSE and QJL for unbiased products
Matches theoretical distortion bounds closely
🔎 Similar Papers
No similar papers found.