Quake: Adaptive Indexing for Vector Search

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high query latency and low recall in approximate nearest neighbor (ANN) search under dynamic skew workloads, this paper proposes AdaptANN—a self-adaptive indexing framework designed for high-dimensional vector streams with frequent updates and queries. Its core contributions are threefold: (1) a novel hierarchical adaptive partitioning mechanism enabling localized re-indexing in response to evolving data distributions; (2) a NUMA-aware parallel query engine coupled with an online access-frequency-driven cost-recall joint model for real-time query parameter optimization; and (3) a lightweight recall estimator guaranteeing target recall. Evaluated on the dynamic Wikipedia vector benchmark, AdaptANN achieves 1.5–22× lower query latency and 6–83× lower index update latency compared to SVS, DiskANN, HNSW, and SCANN, significantly improving both efficiency and accuracy in dynamic settings.

Technology Category

Application Category

📝 Abstract
Vector search, the task of finding the k-nearest neighbors of high-dimensional vectors, underpins many machine learning applications, including recommendation systems and information retrieval. However, existing approximate nearest neighbor (ANN) methods perform poorly under dynamic, skewed workloads where data distributions evolve. We introduce Quake, an adaptive indexing system that maintains low latency and high recall in such environments. Quake employs a hierarchical partitioning scheme that adjusts to updates and changing access patterns, guided by a cost model that predicts query latency based on partition sizes and access frequencies. Quake also dynamically optimizes query execution parameters to meet recall targets using a novel recall estimation model. Furthermore, Quake utilizes optimized query processing, leveraging NUMA-aware parallelism for improved memory bandwidth utilization. To evaluate Quake, we prepare a Wikipedia vector search workload and develop a workload generator to create vector search workloads with configurable access patterns. Our evaluation shows that on dynamic workloads, Quake achieves query latency reductions of 1.5-22x and update latency reductions of 6-83x compared to state-of-the-art indexes SVS, DiskANN, HNSW, and SCANN.
Problem

Research questions and friction points this paper is trying to address.

Adaptive indexing for dynamic skewed vector search workloads
Maintaining low latency and high recall in evolving data distributions
Optimizing query execution parameters for target recall accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical partitioning adapting to updates
Dynamic optimization of query parameters
NUMA-aware parallelism for memory efficiency
🔎 Similar Papers
No similar papers found.