Unified and Efficient Approach for Multi-Vector Similarity Search

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing methods in multi-vector similarity search, which rely on single-vector indexing and neglect inter-vector relationships, thereby struggling to balance recall and efficiency. The paper proposes MV-HNSW, the first native hierarchical navigable small-world graph index tailored for multi-vector data, enabling efficient and accurate retrieval by explicitly modeling vector dependencies. Key innovations include a novel edge weighting function that ensures symmetry, cardinality robustness, and query consistency, as well as a dynamic search strategy that enhances discovery of topologically isolated yet semantically relevant candidates. Extensive experiments on seven real-world datasets demonstrate that MV-HNSW achieves state-of-the-art performance, reducing search latency by up to 14× while maintaining recall above 90%.
📝 Abstract
Multi-Vector Similarity Search is essential for fine-grained semantic retrieval in many real-world applications, offering richer representations than traditional single-vector paradigms. Due to the lack of native multi-vector index, existing methods rely on a filter-and-refine framework built upon single-vector indexes. By treating token vectors within each multi-vector object in isolation and ignoring their correlations, these methods face an inherent dilemma: aggressive filtering sacrifices recall, while conservative filtering incurs prohibitive computational cost during refinement. To address this limitation, we propose MV-HNSW, the first native hierarchical graph index designed for multi-vector data. MV-HNSW introduces a novel edge-weight function that satisfies essential properties (symmetry, cardinality robustness, and query consistency) for graph-based indexing, an accelerated multi-vector similarity computation algorithm, and an augmented search strategy that dynamically discovers topologically disconnected yet relevant candidates. Extensive experiments on seven real-world datasets show that MV-HNSW achieves state-of-the-art search performance, maintaining over 90% recall while reducing search latency by up to 14.0$\times$ compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Multi-Vector Similarity Search
filter-and-refine framework
recall
computational cost
native multi-vector index
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-vector similarity search
native hierarchical graph index
edge-weight function
accelerated similarity computation
augmented search strategy
🔎 Similar Papers
2024-01-16arXiv.orgCitations: 76
B
Binhan Yang
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
Yuxiang Zeng
Yuxiang Zeng
Beihang University
Vector DatabasesFederated DatabasesSpatial Data Analytics
H
Hengxin Zhang
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
Z
Zhuanglin Zheng
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
Y
Yunzhen Chi
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
Y
Yongxin Tong
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China
K
Ke Xu
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, China