Disk-Resident Graph ANN Search: An Experimental Evaluation

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the I/O bottleneck in disk-resident graph-based approximate nearest neighbor (ANN) search under memory-constrained settings, where existing approaches lack a systematic understanding of storage, layout, and execution strategies. The authors propose a unified taxonomy that decomposes such systems into five core components: storage strategy, disk layout, cache management, query execution, and update mechanism. Through fine-grained evaluation and end-to-end experiments, they uncover several non-intuitive insights: vector dimensionality significantly impacts component effectiveness; current disk layouts exhibit I/O utilization consistently below 15%; smaller page sizes outperform larger ones under optimized layouts; and update strategies must be tailored to specific workloads. These findings provide empirical foundations and practical design guidelines for building efficient disk-based ANN systems.

Technology Category

Application Category

📝 Abstract

As data volumes grow while memory capacity remains limited, disk-resident graph-based approximate nearest neighbor (ANN) methods have become a practical alternative to memory-resident designs, shifting the bottleneck from computation to disk I/O. However, since their technical designs diverge widely across storage, layout, and execution paradigms, a systematic understanding of their fundamental performance trade-offs remains elusive. This paper presents a comprehensive experimental study of disk-resident graph-based ANN methods. First, we decompose such systems into five key technical components, i.e., storage strategy, disk layout, cache management, query execution, and update mechanism, and build a unified taxonomy of existing designs across these components. Second, we conduct fine-grained evaluations of representative strategies for each technical component to analyze the trade-offs in throughput, recall, and resource utilization. Third, we perform comprehensive end-to-end experiments and parameter-sensitivity analyses to evaluate overall system performance under diverse configurations. Fourth, our study reveals several non-obvious findings: (1) vector dimensionality fundamentally reshapes component effectiveness, necessitating dimension-aware design; (2) existing layout strategies exhibit surprisingly low I/O utilization (less than or equal to 15%); (3) page size critically affects feasibility and efficiency, with smaller pages preferred when layouts are carefully optimized; and (4) update strategies present clear workload-dependent trade-offs between in-place and out-of-place designs. Based on these findings, we derive practical guidelines for system design and configuration, and outline promising directions for future research.

Problem

Research questions and friction points this paper is trying to address.

disk-resident

graph-based ANN

performance trade-offs

I/O bottleneck

approximate nearest neighbor

Innovation

Methods, ideas, or system contributions that make the work stand out.

disk-resident ANN

graph-based indexing

I/O efficiency