Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data

📅 2025-05-10

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

The field of Fast Approximate Nearest Neighbor Search (FANNS) lacks a systematic survey addressing vector-scalar hybrid data, suffering from inconsistent problem formulations, absence of a unified algorithm taxonomy, and insufficient analysis of query difficulty. Method: We formally define hybrid datasets and hybrid queries, propose a fine-grained algorithm taxonomy centered on pruning mechanisms, and develop a distribution-sensitive query difficulty model. We further design a standardized evaluation framework and an open-source toolchain (Python/PyTorch) supporting hybrid dataset construction, quantitative difficulty assessment, and fair algorithm comparison. Contribution/Results: This work delivers the first structured, comprehensive survey of FANNS for hybrid data—filling a critical research gap. It establishes foundational theoretical principles and practical tools, enabling rigorous analysis and reproducible advancement in hybrid-data nearest neighbor search.

Technology Category

Application Category

📝 Abstract

Filtered approximate nearest neighbor search (FANNS), an extension of approximate nearest neighbor search (ANNS) that incorporates scalar filters, has been widely applied to constrained retrieval of vector data. Despite its growing importance, no dedicated survey on FANNS over the vector-scalar hybrid data currently exists, and the field has several problems, including inconsistent definitions of the search problem, insufficient framework for algorithm classification, and incomplete analysis of query difficulty. This survey paper formally defines the concepts of hybrid dataset and hybrid query, as well as the corresponding evaluation metrics. Based on these, a pruning-focused framework is proposed to classify and summarize existing algorithms, providing a broader and finer-grained classification framework compared to the existing ones. In addition, a review is conducted on representative hybrid datasets, followed by an analysis on the difficulty of hybrid queries from the perspective of distribution relationships between data and queries. This paper aims to establish a structured foundation for FANNS over the vector-scalar hybrid data, facilitate more meaningful comparisons between FANNS algorithms, and offer practical recommendations for practitioners. The code used for downloading hybrid datasets and analyzing query difficulty is available at https://github.com/lyj-fdu/FANNS

Problem

Research questions and friction points this paper is trying to address.

Lack of dedicated survey on filtered approximate nearest neighbor search (FANNS)

Inconsistent definitions and insufficient classification framework for FANNS algorithms

Incomplete analysis of query difficulty in vector-scalar hybrid data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines hybrid dataset and query concepts

Proposes pruning-focused algorithm classification framework

Analyzes query difficulty via data-query distribution

🔎 Similar Papers

Dimensionality-Reduction Techniques for Approximate Nearest Neighbor Search: A Survey and Evaluation