🤖 AI Summary
Existing graph-based indexes (e.g., HNSW) suffer from poor connectivity, low recall, and high latency for nearest-neighbor search under complex hard filtering predicates. This paper proposes a multi-index ensemble architecture coupled with a learnable three-dimensional analytical model to enable dynamic, query-time selection of the optimal index. Our approach integrates workload-aware index packaging, parallel multi-index construction, and a lightweight HNSW variant—achieving high recall while significantly improving efficiency. Experiments show that our method accelerates queries by up to 8.06× over state-of-the-art baselines, reduces indexing time to just 1% of conventional methods, and incurs only 2.15× the memory overhead of standard HNSW. The core innovation lies in decoupling filtering logic into a collaborative multi-index mechanism and employing a learnable analytical model to drive real-time index selection—thereby overcoming the performance bottlenecks inherent to constrained graph traversal in predicate-aware ANN search.
📝 Abstract
Many real-world tasks such as recommending videos with the kids tag can be reduced to finding most similar vectors associated with hard predicates. This task, filtered vector search, is challenging as prior state-of-the-art graph-based (unfiltered) similarity search techniques quickly degenerate when hard constraints are considered. That is, effective graph-based filtered similarity search relies on sufficient connectivity for reaching the most similar items within just a few hops. To consider predicates, recent works propose modifying graph traversal to visit only the items that may satisfy predicates. However, they fail to offer the just-a-few-hops property for a wide range of predicates: they must restrict predicates significantly or lose efficiency if only a small fraction of items satisfy predicates.
We propose an opposite approach: instead of constraining traversal, we build many indexes each serving different predicate forms. For effective construction, we devise a three-dimensional analytical model capturing relationships among index size, search time, and recall, with which we follow a workload-aware approach to pack as many useful indexes as possible into a collection. At query time, the analytical model is employed yet again to discern the one that offers the fastest search at a given recall. We show superior performance and support on datasets with varying selectivities and forms: our approach achieves up to 8.06x speedup while having as low as 1% build time versus other indexes, with less than 2.15x memory of a standard HNSW graph and modest knowledge of past workloads.