PipeANN-Filter: An Efficient Filtered Vector Search System on SSD

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

257K/year
🤖 AI Summary
This work addresses the high I/O overhead and latency incurred by existing systems when performing attribute-constrained vector search on SSDs, primarily due to frequent attribute data accesses. To mitigate this, the authors propose an efficient filtering mechanism that leverages probabilistic data structures—such as Bloom filters—to construct a superset of candidate vectors, thereby drastically reducing the number of attribute reads while tolerating a small number of false positives. This is followed by approximate nearest neighbor search and post-hoc attribute verification to ensure result correctness. The proposed approach significantly lowers I/O costs, achieving substantially higher throughput and lower latency compared to state-of-the-art systems, with the implementation publicly released on an open-source platform.
📝 Abstract
We propose PipeANN-Filter, an efficient filtered vector search system on SSD. Unlike existing systems that explore only valid vectors (i.e., those satisfying the attribute constraints) during search, PipeANN-Filter explores a superset of valid vectors, and performs attribute verification after getting the top-k closest result vectors. This allows PipeANN-Filter to leverage probabilistic data structures (e.g., Bloom filters) to identify the superset, trading off a small number of false-positive vector explorations for a massive reduction in SSD I/O for attribute reading. Evaluations show that PipeANN-Filter improves search latency and throughput compared to state-of-the-art systems. PipeANN-Filter is open-source at https://github.com/thustorage/PipeANN
Problem

Research questions and friction points this paper is trying to address.

filtered vector search
SSD
attribute constraints
vector search system
I/O efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

filtered vector search
SSD-based retrieval
Bloom filter
approximate nearest neighbor
I/O optimization
🔎 Similar Papers