SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance bottlenecks in sparse vector Approximate Maximum Inner Product Search (MIPS) for RAG multi-path retrieval—namely redundant distance computations, random memory access, and lack of SIMD acceleration—this paper proposes SINDI, a novel indexing framework. Methodologically, SINDI introduces three key innovations: (1) a magnitude-aware vector pruning strategy that retains only high-magnitude elements, enabling non-redundant inner product computation; (2) sequential memory access patterns, replacing the random accesses inherent in inverted-index-based approaches; and (3) integrated compressed sparse storage with batched SIMD-parallelized inner product evaluation. Under Recall@50 ≥ 99%, SINDI achieves 4.2×–26.4× higher single-threaded QPS than SEISMIC and PyANNs. The framework has been production-deployed in VSAG, Ant Group’s open-source vector search library.

Technology Category

Application Category

📝 Abstract
Sparse vector Maximum Inner Product Search (MIPS) is crucial in multi-path retrieval for Retrieval-Augmented Generation (RAG). Recent inverted index-based and graph-based algorithms have achieved high search accuracy with practical efficiency. However, their performance in production environments is often limited by redundant distance computations and frequent random memory accesses. Furthermore, the compressed storage format of sparse vectors hinders the use of SIMD acceleration. In this paper, we propose the sparse inverted non-redundant distance index (SINDI), which incorporates three key optimizations: (i) Efficient Inner Product Computation: SINDI leverages SIMD acceleration and eliminates redundant identifier lookups, enabling batched inner product computation; (ii) Memory-Friendly Design: SINDI replaces random memory accesses to original vectors with sequential accesses to inverted lists, substantially reducing memory-bound latency. (iii) Vector Pruning: SINDI retains only the high-magnitude non-zero entries of vectors, improving query throughput while maintaining accuracy. We evaluate SINDI on multiple real-world datasets. Experimental results show that SINDI achieves state-of-the-art performance across datasets of varying scales, languages, and models. On the MsMarco dataset, when Recall@50 exceeds 99%, SINDI delivers single-thread query-per-second (QPS) improvements ranging from 4.2 to 26.4 times compared with SEISMIC and PyANNs. Notably, SINDI has been integrated into Ant Group's open-source vector search library, VSAG.
Problem

Research questions and friction points this paper is trying to address.

Optimizing sparse vector maximum inner product search efficiency
Reducing redundant computations and random memory accesses
Enabling SIMD acceleration despite compressed storage formats
Innovation

Methods, ideas, or system contributions that make the work stand out.

SIMD acceleration for batched inner product computation
Sequential memory access to reduce latency
Vector pruning to maintain high query throughput
Ruoxuan Li
Ruoxuan Li
Columbia University
computational cognitive sciencecomputational social science
X
Xiaoyao Zhong
Ant Group, Shanghai, China
Jiabao Jin
Jiabao Jin
Ant Group
Vector DataBase
P
Peng Cheng
Tongji University & ECNU, Shanghai, China
W
Wangze Ni
Zhejiang University, Hangzhou, China
L
Lei Chen
HKUST (GZ) & HKUST, Guangzhou & HK SAR, China
Zhitao Shen
Zhitao Shen
Ant Group
databasedata storage
W
Wei Jia
Ant Group, Shanghai, China
Xiangyu Wang
Xiangyu Wang
Professor, Curtin University
Civil EngineeringBuilding Information ModelingSmart CityAutomation and RoboticsSmart
X
Xuemin Lin
Shanghai Jiaotong University, Shanghai, China
H
Heng Tao Shen
Tongji University, Shanghai, China
J
Jingkuan Song
Tongji University, Shanghai, China