Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data

πŸ“… 2025-05-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
The field of Fast Approximate Nearest Neighbor Search (FANNS) lacks a systematic survey addressing vector-scalar hybrid data, suffering from inconsistent problem formulations, absence of a unified algorithm taxonomy, and insufficient analysis of query difficulty. Method: We formally define hybrid datasets and hybrid queries, propose a fine-grained algorithm taxonomy centered on pruning mechanisms, and develop a distribution-sensitive query difficulty model. We further design a standardized evaluation framework and an open-source toolchain (Python/PyTorch) supporting hybrid dataset construction, quantitative difficulty assessment, and fair algorithm comparison. Contribution/Results: This work delivers the first structured, comprehensive survey of FANNS for hybrid dataβ€”filling a critical research gap. It establishes foundational theoretical principles and practical tools, enabling rigorous analysis and reproducible advancement in hybrid-data nearest neighbor search.

Technology Category

Application Category

πŸ“ Abstract
Filtered approximate nearest neighbor search (FANNS), an extension of approximate nearest neighbor search (ANNS) that incorporates scalar filters, has been widely applied to constrained retrieval of vector data. Despite its growing importance, no dedicated survey on FANNS over the vector-scalar hybrid data currently exists, and the field has several problems, including inconsistent definitions of the search problem, insufficient framework for algorithm classification, and incomplete analysis of query difficulty. This survey paper formally defines the concepts of hybrid dataset and hybrid query, as well as the corresponding evaluation metrics. Based on these, a pruning-focused framework is proposed to classify and summarize existing algorithms, providing a broader and finer-grained classification framework compared to the existing ones. In addition, a review is conducted on representative hybrid datasets, followed by an analysis on the difficulty of hybrid queries from the perspective of distribution relationships between data and queries. This paper aims to establish a structured foundation for FANNS over the vector-scalar hybrid data, facilitate more meaningful comparisons between FANNS algorithms, and offer practical recommendations for practitioners. The code used for downloading hybrid datasets and analyzing query difficulty is available at https://github.com/lyj-fdu/FANNS
Problem

Research questions and friction points this paper is trying to address.

Lack of dedicated survey on filtered approximate nearest neighbor search (FANNS)
Inconsistent definitions and insufficient classification framework for FANNS algorithms
Incomplete analysis of query difficulty in vector-scalar hybrid data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines hybrid dataset and query concepts
Proposes pruning-focused algorithm classification framework
Analyzes query difficulty via data-query distribution
πŸ”Ž Similar Papers
No similar papers found.
Y
Yanjun Lin
School of Computer Science, Fudan University, Shanghai, China
K
Kai Zhang
School of Computer Science, Fudan University, Shanghai, China
Z
Zhenying He
School of Computer Science, Fudan University, Shanghai, China
Y
Yinan Jing
School of Computer Science, Fudan University, Shanghai, China
X. Sean Wang
X. Sean Wang
School of Computer Science, Fudan University
Database SystemsInformation Security and PrivacyWireless Sensor NetworksStreaming Data Processing Time Series QueriesDat