🤖 AI Summary
This work addresses the high storage overhead and low computational efficiency in approximate nearest neighbor search with arbitrary filtering predicates (AFANNS) by proposing MCI, a graph index structure based on maximal clique covering. MCI is the first to leverage maximal clique covering for compressing approximate nearest neighbor graphs, integrating local neighborhood geometric densification and a lock-free parallel construction mechanism. This approach significantly reduces both memory consumption and computational cost while preserving graph connectivity. Experimental results across ten datasets demonstrate that MCI achieves up to an order-of-magnitude higher query throughput (QPS) compared to state-of-the-art methods, substantially lowers memory usage, and maintains excellent recall performance in range and keyword-filtered search tasks.
📝 Abstract
Approximate Nearest Neighbor Search with arbitrary filtering predicates (AFANNS) is essential for modern data applications, yet existing methods often incur substantial storage and computational costs. In this work, we introduce the Maximal Clique Index (\mci), a novel graph-based index designed for robust and efficient AFANNS. The core idea of \mci is to approximate a dense Nearest Neighbor Graph (NNG) through a compact, clique-based representation. We propose two key techniques: (1) Maximal Clique Cover (\mcc), which exploits the geometric transitivity of high-dimensional spaces to encode dense neighborhoods as maximal cliques, achieving an index with high compression and connectivity; and (2) Local Neighborhood Graph Geometric Densification, a strategy that constructs an index approximating a large NNG from a sparse initial NNG, recovers global connectivity by progressively increasing distance thresholds to locally densify the structure. The index is built in a lock-free parallel manner for scalability and queried via a carefully-designed multi-seed strategy to handle fragmented predicate-induced subgraphs. Extensive experiments on 10 datasets show that \mci significantly outperforms state-of-the-art methods by up to one order of magnitude in QPS at high recall while using substantially smaller space, and remains competitive even on range/keyword filtering tasks, demonstrating robust general-purpose performance.