Decoupling Vector Data and Index Storage for Space Efficiency

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

231K/year
🤖 AI Summary
This work addresses the high storage overhead, redundant I/O, and write amplification in large-scale disk-based approximate nearest neighbor search (ANNS) systems, which stem from the tight coupling of vector data and index metadata. To overcome these limitations, we propose DecoupleVS, a novel framework that, for the first time, decouples the storage of vector data from index metadata, enabling independent compression, layout optimization, and operation-specific tuning for each component. By integrating specialized compression algorithms, an efficient query path, and an update mechanism with low write amplification, DecoupleVS reduces storage consumption by up to 58.7% on billion-scale real-world datasets while preserving query accuracy. Moreover, it achieves query and update performance that matches or exceeds that of state-of-the-art integrated ANNS systems.

Technology Category

Application Category

📝 Abstract
Managing large-scale vector datasets with disk-based approximate nearest neighbor search (ANNS) systems faces critical efficiency challenges stemming from the co-location of vector data and auxiliary index metadata. Our analysis of state-of-the-art ANNS systems reveals that such co-location incurs substantial storage overhead, generates excessive reads during search queries, and causes severe write amplification during updates. We present DecoupleVS, a decoupled vector storage management framework that enables specialized optimizations for vector data and auxiliary index metadata. DecoupleVS incorporates various design techniques for effective compression, data layouts, search queries, and updates, so as to significantly reduce storage space, while maintaining high search and update performance and high search accuracy. Evaluation on real-world public and proprietary billion-scale datasets shows that DecoupleVS reduces storage space by up to 58.7\%, while delivering competitive or improved search query and update performance, compared to state-of-the-art monolithic disk-based ANNS systems.
Problem

Research questions and friction points this paper is trying to address.

vector storage
approximate nearest neighbor search
storage overhead
write amplification
index metadata
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Storage
Vector Data Management
Approximate Nearest Neighbor Search (ANNS)
Storage Efficiency
Index Metadata
🔎 Similar Papers
2024-01-16arXiv.orgCitations: 76