Graph-based Nearest Neighbors with Dynamic Updates via Random Walks

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Existing approximate nearest neighbor (ANN) graph indexes (e.g., HNSW) support only insertions and lack efficient, high-quality dynamic deletion mechanisms—leading to degraded recall, increased query latency, or prohibitively high deletion overhead. This paper introduces the first dynamic ANN indexing framework grounded in random walk theory, which strictly preserves the original hitting-time statistics after deletions—a theoretical first. We further propose a deterministic deletion algorithm that jointly optimizes query latency, recall, deletion time, and memory footprint through dynamic graph maintenance and multi-layer navigation refinement. Extensive experiments demonstrate that, compared to state-of-the-art deletion methods, our approach achieves up to 12% higher recall, 40% faster deletion, 22% lower query latency, and 18% reduced memory consumption.

Technology Category

Application Category

📝 Abstract

Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a deterministic deletion algorithm, and show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.

Problem

Research questions and friction points this paper is trying to address.

Deletion unsupported in HNSW graph-based nearest neighbor search

Existing deletion methods degrade query latency, recall, or deletion time

Proposes random walk framework for deterministic deletion with improved tradeoffs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random walk framework for graph-based ANN

Deterministic deletion algorithm preserving hitting times

Balanced tradeoff in query latency, recall, deletion

🔎 Similar Papers

No similar papers found.