Attribute Filtering in Approximate Nearest Neighbor Search: An In-depth Experimental Study

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Filtering Approximate Nearest Neighbor (ANN) search—i.e., vector similarity search under structured attribute constraints—lacks a systematic analytical framework and standardized evaluation methodology. Method: We propose the first comprehensive analysis framework for Filtering ANN, featuring (1) a unified interface and a novel taxonomy grounded in attribute types and filtering strategies; (2) a reproducible experimental platform that decouples the impacts of index structures, pruning techniques, and entry-point selection on performance; and (3) rigorous evaluation across four real-world and synthetic datasets (up to 10M entries), covering ten algorithms and twelve variants under query selectivity ranging from high to low. Contribution/Results: Our analysis reveals empirically validated combinatorial principles for effective multi-dimensional pruning, edge filtering, and entry-point optimization. We provide actionable, application-oriented guidelines for algorithm selection. All code is open-sourced to advance standardization and reproducibility in Filtering ANN research.

Technology Category

Application Category

📝 Abstract
With the growing integration of structured and unstructured data, new methods have emerged for performing similarity searches on vectors while honoring structured attribute constraints, i.e., a process known as Filtering Approximate Nearest Neighbor (Filtering ANN) search. Since many of these algorithms have only appeared in recent years and are designed to work with a variety of base indexing methods and filtering strategies, there is a pressing need for a unified analysis that identifies their core techniques and enables meaningful comparisons. In this work, we present a unified Filtering ANN search interface that encompasses the latest algorithms and evaluate them extensively from multiple perspectives. First, we propose a comprehensive taxonomy of existing Filtering ANN algorithms based on attribute types and filtering strategies. Next, we analyze their key components, i.e., index structures, pruning strategies, and entry point selection, to elucidate design differences and tradeoffs. We then conduct a broad experimental evaluation on 10 algorithms and 12 methods across 4 datasets (each with up to 10 million items), incorporating both synthetic and real attributes and covering selectivity levels from 0.1% to 100%. Finally, an in-depth component analysis reveals the influence of pruning, entry point selection, and edge filtering costs on overall performance. Based on our findings, we summarize the strengths and limitations of each approach, provide practical guidelines for selecting appropriate methods, and suggest promising directions for future research. Our code is available at: https://github.com/lmccccc/FANNBench.
Problem

Research questions and friction points this paper is trying to address.

Evaluating filtering methods for nearest neighbor search with attributes
Analyzing performance tradeoffs in attribute-constrained similarity algorithms
Providing guidelines for selecting optimal filtering ANN techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified interface for Filtering ANN algorithms
Taxonomy based on attribute types and strategies
Component analysis of pruning and selection
🔎 Similar Papers
No similar papers found.
M
Mocheng Li
The Chinese University of Hong Kong, Shenzhen
X
Xiao Yan
Wuhan University
Baotong Lu
Baotong Lu
Microsoft Research
Database SystemsMachine Learning Systems
Y
Yue Zhang
The Chinese University of Hong Kong, Shenzhen
J
James Cheng
The Chinese University of Hong Kong
Chenhao Ma
Chenhao Ma
The Chinese University of Hong Kong, Shenzhen
Data managementdata mining