Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

📅 2024-04-30

🏛️ Applied Informatics

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address index invalidation and query degradation in approximate k-nearest neighbor (ANN) search over dynamic data streams—caused by frequent insertions and deletions—this paper proposes a synergistic mechanism of adaptive re-indexing and incremental updates. We first systematically identify stability bottlenecks of ANN algorithms under dynamic workloads, then design a lightweight change-aware indexing maintenance strategy that jointly optimizes accuracy and throughput. Our approach integrates locality-sensitive hashing (LSH) with hierarchical navigable small world (HNSW) graphs, incorporating sliding-window sampling, localized graph repair, and deferred merge techniques. Evaluated on billion-scale dynamic vector datasets, our method achieves over 95% recall while reducing query latency by 40% and increasing update throughput by 3.2×, significantly outperforming state-of-the-art baselines FAISS-Dynamic and DiskANN-Delta.