Updating Graph-based Index with Fine-grained Blocks for Large-scale Streaming High-dimensional Vectors

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the imbalance between update efficiency and query accuracy in graph indexing for streaming high-dimensional vector data under small-batch dynamic updates, this paper proposes a fine-grained block-level update mechanism coupled with a lightweight incremental graph repair strategy. Specifically, it employs disk-friendly, localized block-level updates to minimize I/O redundancy, and integrates low-overhead neighborhood reconnection and edge pruning to ensure real-time index quality maintenance. Unlike batched delayed updates that degrade accuracy, our approach enables continuous, timely index adaptation. Experimental results show that, while preserving low query latency and high approximate nearest neighbor search (ANNS) accuracy, our method achieves up to 4.16× higher continuous update throughput compared to state-of-the-art methods. This significantly enhances the joint optimization of timeliness and accuracy in dynamic graph indexing for streaming workloads.

Technology Category

Application Category

📝 Abstract
To meet the demand for large-scale high-dimensional vector approximate nearest neighbor search (ANNS), many graph-based ANNS systems have been widely adopted due to their excellent efficiency-accuracy trade-offs. Nevertheless, in dynamic scenarios involving frequent vector insertions and deletions, existing systems mitigate the overhead by employing batch update strategies, which improve update performance by increasing the batch size. However, excessively increasing the batch size leads to index update delays, which, in turn, cause a significant degradation in query accuracy. This work aims to improve the performance of graph-based ANNS systems in small-batch update scenarios, achieving a balance between update efficiency and query accuracy. We identify two key issues with existing batch update strategies during small-batch updates: (1) significant data waste in disk read/write operations, and (2) frequent triggering of large-scale pruning operations involving high-cost vector computations by the incremental algorithm. To address these issues, we introduce Greator, a disk-based system with a novel graph-based index update method. The core idea of Greator is to accumulate only a small number of vector updates per batch to prevent excessive index degradation, while designing an efficient fine-grained incremental update scheme that reduces data wastage during I/O operations. Additionally, we introduce a lightweight incremental graph repair strategy to reduce pruning operations, thereby minimizing the expensive vector computations. Based on extensive experiments on real-world datasets, Greator can integrate continuous updates faster than the state-of-the-art solutions, achieving up to 4.16X speedup, while maintaining stable index quality to produce low query latency and high query accuracy of approximate vector searches.
Problem

Research questions and friction points this paper is trying to address.

Improve graph-based ANNS systems for small-batch updates
Reduce data waste in disk read/write operations
Minimize high-cost vector computations in pruning operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained incremental update scheme reduces I/O waste
Lightweight graph repair minimizes costly vector computations
Small-batch updates balance efficiency and query accuracy
🔎 Similar Papers
No similar papers found.
S
Song Yu
Northeastern University, China
S
Shengyuan Lin
Northeastern University, China
Shufeng Gong
Shufeng Gong
Northeastern University
bidgata
Y
Yongqing Xie
Huawei Technology Co., Ltd
R
Ruicheng Liu
Huawei Technology Co., Ltd
Yijie Zhou
Yijie Zhou
The Chinese University of Hong Kong, Shenzhen
Distributed OptimizationPrivacy Preserving
P
Pufan Zuo
Northeastern University, China
Yanfeng Zhang
Yanfeng Zhang
Northeastern University, China
Database SystemsMachine Learning Systems
Ji Sun
Ji Sun
Huawei
database
G
Ge Yu
Northeastern University, China