🤖 AI Summary
In approximate nearest neighbor search (ANNS), distance computation accounts for up to 99% of query latency, constituting the primary performance bottleneck. This paper proposes LearnOrtho, a machine learning–driven orthogonal transformation method that adaptively concentrates signal energy into the first half of dimensions, enabling end-to-end integration without modifying existing indices. It synergistically combines level-major memory layout, cache-aware access patterns, and SIMD-vectorized partial distance computation to support early candidate pruning. Our key innovation is the first introduction of *learnable orthogonal transformations* into ANNS for cumulative distance bound optimization—balancing accuracy and efficiency. Evaluated across diverse datasets, LearnOrtho delivers 2×–30× end-to-end speedup over state-of-the-art indices—including IVFPQ, HNSW, MRPT, and Annoy—while preserving 100% recall.
📝 Abstract
Approximate Nearest-Neighbor Search (ANNS) efficiently finds data items whose embeddings are close to that of a given query in a high-dimensional space, aiming to balance accuracy with speed. Used in recommendation systems, image and video retrieval, natural language processing, and retrieval-augmented generation (RAG), ANNS algorithms such as IVFPQ, HNSW graphs, Annoy, and MRPT utilize graph, tree, clustering, and quantization techniques to navigate large vector spaces. Despite this progress, ANNS systems spend up to 99% of query time to compute distances in their final refinement phase. In this paper, we present PANORAMA, a machine learning-driven approach that tackles the ANNS verification bottleneck through data-adaptive learned orthogonal transforms that facilitate the accretive refinement of distance bounds. Such transforms compact over 90% of signal energy into the first half of dimensions, enabling early candidate pruning with partial distance computations. We integrate PANORAMA into state-of-the-art ANNS methods, namely IVFPQ/Flat, HNSW, MRPT, and Annoy, without index modification, using level-major memory layouts, SIMD-vectorized partial distance computations, and cache-aware access patterns. Experiments across diverse datasets -- from image-based CIFAR-10 and GIST to modern embedding spaces including OpenAI's Ada 2 and Large 3 -- demonstrate that PANORAMA affords a 2--30$ imes$ end-to-end speedup with no recall loss.