GPU-Native Approximate Nearest Neighbor Search with IVF-RaBitQ: Fast Index Build and Search

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenge of achieving efficient GPU-accelerated approximate nearest neighbor search (ANNS) by simultaneously optimizing index construction speed, query throughput, recall accuracy, and memory footprint. The authors propose an end-to-end GPU-native ANNS pipeline that integrates IVF clustering with RaBitQ—a scalable low-bit quantization method—and introduces a fused search kernel that eliminates the need to access original vectors during reranking. This design significantly improves the trade-off between efficiency and accuracy. Integrated into the NVIDIA cuVS library, the proposed approach achieves 2.2× higher queries per second (QPS) than CAGRA at a recall of approximately 0.95, while constructing the index 7.7× faster. Compared to IVF-PQ, it delivers over 2.7× higher throughput without requiring reranking on original vectors.

Technology Category

Application Category

📝 Abstract

Approximate nearest neighbor search (ANNS) on GPUs is gaining increasing popularity for modern retrieval and recommendation workloads that operate over massive high-dimensional vectors. Graph-based indexes deliver high recall and throughput but incur heavy build-time and storage costs. In contrast, cluster-based methods build and scale efficiently yet often need many probes for high recall, straining memory bandwidth and compute. Aiming to simultaneously achieve fast index build, high-throughput search, high recall, and low storage requirement for GPUs, we present IVF-RaBitQ (GPU), a GPU-native ANNS solution that integrates the cluster-based method IVF with RaBitQ quantization into an efficient GPU index build/search pipeline. Specifically, for index build, we develop a scalable GPU-native RaBitQ quantization method that enables fast and accurate low-bit encoding at scale. For search, we develop GPU-native distance computation schemes for RaBitQ codes and a fused search kernel to achieve high throughput with high recall. With IVF-RaBitQ implemented and integrated into the NVIDIA cuVS Library, experiments on cuVS Bench across multiple datasets show that IVF-RaBitQ offers a strong performance frontier in recall, throughput, index build time, and storage footprint. For Recall approximately equal to 0.95, IVF-RaBitQ achieves 2.2x higher QPS than the state-of-the-art graph-based method CAGRA, while also constructing indices 7.7x faster on average. Compared to the cluster-based method IVF-PQ, IVF-RaBitQ delivers on average over 2.7x higher throughput while avoiding accessing the raw vectors for reranking.

Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search

GPU

Index Build

Recall

Storage Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-native

IVF-RaBitQ

low-bit quantization