Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

📅 2024-04-30
🏛️ Applied Informatics
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address index invalidation and query degradation in approximate k-nearest neighbor (ANN) search over dynamic data streams—caused by frequent insertions and deletions—this paper proposes a synergistic mechanism of adaptive re-indexing and incremental updates. We first systematically identify stability bottlenecks of ANN algorithms under dynamic workloads, then design a lightweight change-aware indexing maintenance strategy that jointly optimizes accuracy and throughput. Our approach integrates locality-sensitive hashing (LSH) with hierarchical navigable small world (HNSW) graphs, incorporating sliding-window sampling, localized graph repair, and deferred merge techniques. Evaluated on billion-scale dynamic vector datasets, our method achieves over 95% recall while reducing query latency by 40% and increasing update throughput by 3.2×, significantly outperforming state-of-the-art baselines FAISS-Dynamic and DiskANN-Delta.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor Search
Dynamic Datasets
Index Updating Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximate Nearest Neighbor
Dynamic Datasets
Hierarchical Navigable Small World Graphs