🤖 AI Summary
Storing and retrieving high-dimensional embedding vectors incurs substantial memory overhead and computational cost. To address this, we propose Non-uniform Vector Quantization (NVQ), the first method that learns a customizable, lightweight nonlinear transformation for each index vector to enable personalized non-uniform quantization. Departing from conventional uniform quantization schemes, NVQ models local data distributions via compact, differentiable functions, achieving high-fidelity compression with minimal computational overhead. Extensive experiments on standard benchmarks demonstrate that NVQ consistently outperforms state-of-the-art methods—including PQ, OPQ, and AQ—at equivalent compression ratios, improving average Recall@10 by 3.2–7.8 percentage points while maintaining millisecond-scale query latency. Our core contributions are: (i) the first individualized non-uniform quantization framework tailored for approximate nearest neighbor (ANN) search; (ii) an efficient, learnable nonlinear transformation mechanism; and (iii) joint optimization of accuracy, efficiency, and compression ratio.
📝 Abstract
Embedding vectors are widely used for representing unstructured data and searching through it for semantically similar items. However, the large size of these vectors, due to their high-dimensionality, creates problems for modern vector search techniques: retrieving large vectors from memory/storage is expensive and their footprint is costly. In this work, we present NVQ (non-uniform vector quantization), a new vector compression technique that is computationally and spatially efficient in the high-fidelity regime. The core in NVQ is to use novel parsimonious and computationally efficient nonlinearities for building non-uniform vector quantizers. Critically, these quantizers are emph{individually} learned for each indexed vector. Our experimental results show that NVQ exhibits improved accuracy compared to the state of the art with a minimal computational cost.