AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

📅 2024-04-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

219K/year

🤖 AI Summary

To address the high DRAM consumption, prohibitive deployment costs, slow index loading, and inefficient multi-index switching inherent in graph-based approximate nearest neighbor search (ANNS) for large-scale vector retrieval, this paper proposes the first fully storage-resident (SSD-based) ANNS architecture. Our method retains product-quantized (PQ) vectors entirely on SSD, reducing host memory footprint to ~10 MB. It integrates SSD-aware index layout, memory-constrained query scheduling, and a RAG-optimized interface, enabling end-to-end in-storage processing atop DiskANN. Evaluated on billion-scale datasets, our system achieves high recall with millisecond-level latency while supporting sub-second hot switching across multiple indices and drastically accelerating index loading. The implementation is open-sourced and designed for horizontal scaling across multi-server deployments.

Technology Category

Application Category

📝 Abstract

Graph-based approximate nearest neighbor search (ANNS) algorithms work effectively against large-scale vector retrieval. Among such methods, DiskANN achieves good recall-speed tradeoffs using both DRAM and storage. DiskANN adopts product quantization (PQ) to reduce memory usage, which is still proportional to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads compressed vectors to the SSD index. Our method achieves $sim$10 MB memory usage in query search with billion-scale datasets without critical latency degradation. AiSAQ also reduces the index load time for query search preparation, which enables fast switch between muitiple billion-scale indices.This method can be applied to retrievers of retrieval-augmented generation (RAG) and be scaled out with multiple-server systems for emerging datasets. Our DiskANN-based implementation is available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

DRAM-free vector retrieval

Minimizing memory usage in ANNS

Fast index load for large datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

All-in-Storage ANNS with PQ

Offloads compressed vectors to SSD

Achieves minimal memory usage

🔎 Similar Papers

PQCache: Product Quantization-based KVCache for Long Context LLM Inference