🤖 AI Summary
This work addresses the significant degradation in retrieval efficiency caused by fragmented connectivity in traditional graph indexes when handling low-selectivity filtered queries, where qualifying vectors are sparse. To overcome this limitation, the authors propose Curator, a partitioned dual-index architecture based on shared clustering trees that constructs dedicated sub-indexes for different labels, thereby maintaining high search efficiency while reducing memory overhead. Curator introduces an adaptive partitioning mechanism that supports incremental updates and enables on-the-fly temporary index construction during query execution, effectively accommodating complex predicate filtering. Experimental results demonstrate that, when integrated with state-of-the-art graph indexes, Curator reduces query latency by up to 20.9× for low-selectivity queries, with only a 5.5% increase in index construction time and a 4.3% increase in memory usage.
📝 Abstract
Embedding-based dense retrieval has become the cornerstone of many critical applications, where approximate nearest neighbor search (ANNS) queries are often combined with filters on labels such as dates and price ranges. Graph-based indexes achieve state-of-the-art performance on unfiltered ANNS but encounter connectivity breakdown on low-selectivity filtered queries, where qualifying vectors become sparse and the graph structure among them fragments. Recent research proposes specialized graph indexes that address this issue by expanding graph degree, which incurs prohibitively high construction costs. Given these inherent limitations of graph-based methods, we argue for a dual-index architecture and present Curator, a partition-based index that complements existing graph-based approaches for low-selectivity filtered ANNS. Curator builds specialized indexes for different labels within a shared clustering tree, where each index adapts to the distribution of its qualifying vectors to ensure efficient search while sharing structure to minimize memory overhead. The system also supports incremental updates and handles arbitrary complex predicates beyond single-label filters by efficiently constructing temporary indexes on the fly. Our evaluation demonstrates that integrating Curator with state-of-the-art graph indexes reduces low-selectivity query latency by up to 20.9x compared to pre-filtering fallback, while increasing construction time and memory footprint by only 5.5% and 4.3%, respectively.