🤖 AI Summary
Existing hybrid approximate nearest neighbor search methods struggle to handle the challenges posed by data distribution heterogeneity, particularly the discrepancies in similarity scales and sensitivity to attribute cardinality. This work proposes STABLE, a novel framework that jointly addresses these issues for the first time. STABLE introduces an enhanced heterogeneous semantic-aware AUTO metric to unify feature similarity and attribute consistency, constructs a heterogeneous semantic relational graph indexed via HELP, and incorporates a dynamic heterogeneous routing mechanism to enable efficient retrieval. Evaluated on five benchmark datasets with varying attribute cardinalities, STABLE substantially outperforms state-of-the-art methods, achieving significant improvements in accuracy, efficiency, and robustness.
📝 Abstract
Hybrid Approximate Nearest Neighbor Search (Hybrid ANNS) is a foundational search technology for large-scale heterogeneous data and has gained significant attention in both academia and industry. However, current approaches overlook the heterogeneity in data distribution, thus ignoring two major challenges: the Compatibility Barrier for Similarity Magnitude Heterogeneity and the Tolerance Bottleneck to Attribute Cardinality. To overcome these issues, we propose the robuSt heTerogeneity-Aware hyBrid retrievaL framEwork, STABLE, designed for accurate, efficient, and robust hybrid ANNS under datasets with various distributions. Specifically, we introduce an enhAnced heterogeneoUs semanTic perceptiOn (AUTO) metric to achieve a joint measurement of feature similarity and attribute consistency, addressing similarity magnitude heterogeneity and improving robustness to datasets with various attribute cardinalities. Thereafter, we construct our Heterogeneous sEmantic reLation graPh (HELP) index based on AUTO to organize heterogeneous semantic relations. Finally, we employ a novel Dynamic Heterogeneity Routing method to ensure an efficient search. Extensive experiments on five feature vector benchmarks with various attribute cardinalities demonstrate the superior performance of STABLE.