I/O Optimizations for Graph-Based Disk-Resident Approximate Nearest Neighbor Search: A Design Space Exploration

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In SSD-backed approximate nearest neighbor (ANN) search, I/O overhead accounts for 70%–90% of total query cost, severely limiting performance. This work systematically investigates the joint optimization space across memory layout, disk layout, and search algorithms for graph-based disk ANN systems. We introduce a page-level complexity model to quantify how page locality and path length impact I/O, revealing cross-dimensional synergies. Building on these insights, we design OctopusANN, a high-performance system that integrates memory-resident navigation, dynamic beam width, page reordering, and intra-page search strategies, all deeply tailored to graph index structures and SSD characteristics. Experimental results show that at Recall@10 = 90%, OctopusANN achieves 4.1%–37.9% higher throughput than Starling and 87.5%–149.5% higher throughput than DiskANN, significantly reducing I/O overhead.

Technology Category

Application Category

📝 Abstract
Approximate nearest neighbor (ANN) search on SSD-backed indexes is increasingly I/O-bound (I/O accounts for 70--90\% of query latency). We present an I/O-first framework for disk-based ANN that organizes techniques along three dimensions: memory layout, disk layout, and search algorithm. We introduce a page-level complexity model that explains how page locality and path length jointly determine page reads, and we validate the model empirically. Using consistent implementations across four public datasets, we quantify both single-factor effects and cross-dimensional synergies. We find that (i) memory-resident navigation and dynamic width provide the strongest standalone gains; (ii) page shuffle and page search are weak alone but complementary together; and (iii) a principled composition, OctopusANN, substantially reduces I/O and achieves 4.1--37.9\% higher throughput than the state-of-the-art system Starling and 87.5--149.5\% higher throughput than DiskANN at matched Recall@10=90\%. Finally, we distill actionable guidelines for selecting storage-centric or hybrid designs across diverse concurrency levels and accuracy constraints, advocating systematic composition rather than isolated tweaks when pushing the performance frontier of disk-based ANN.
Problem

Research questions and friction points this paper is trying to address.

Approximate Nearest Neighbor
I/O Optimization
Disk-Resident Index
SSD
Page Locality
Innovation

Methods, ideas, or system contributions that make the work stand out.

I/O optimization
disk-based ANN
page-level complexity model
OctopusANN
design space exploration
🔎 Similar Papers
No similar papers found.