An In-Depth Study of Filter-Agnostic Vector Search on a PostgreSQL Database System: [Experiments and Analysis]

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the gap between theoretical vector retrieval evaluation and real-world performance by systematically assessing filter-agnostic vector search strategies in a production-grade, PostgreSQL-compatible database. Unlike prior studies that rely on specialized libraries and overlook system-level overheads, this paper evaluates post-filtering versus inline filtering under varying selectivity and relevance conditions, integrating graph-based indexes (e.g., NaviX/ACORN) and clustering-based indexes (e.g., ScaNN). The findings reveal that graph indexes often underperform ScaNN in realistic settings due to frequent filtering checks and substantial system overhead. The study argues that algorithm selection should holistically consider distance computation costs, filtering operations, and data access overhead—not merely theoretical complexity—thereby offering practical guidance for deploying efficient vector search systems in enterprise environments.

Technology Category

Application Category

📝 Abstract
Filtered Vector Search (FVS) is critical for supporting semantic search and GenAI applications in modern database systems. However, existing research most often evaluates algorithms in specialized libraries, making optimistic assumptions that do not align with enterprise-grade database systems. Our work challenges this premise by demonstrating that in a production-grade database system, commonly made assumptions do not hold, leading to performance characteristics and algorithmic trade-offs that are fundamentally different from those observed in isolated library settings. This paper presents the first in-depth analysis of filter-agnostic FVS algorithms within a production PostgreSQL-compatible system. We systematically evaluate post-filtering and inline-filtering strategies across a wide range of selectivities and correlations. Our central finding is that the optimal algorithm is not dictated by the cost of distance computations alone, but that system-level overheads that come from both distance computations and filter operations (like page accesses and data retrieval) play a significant role. We demonstrate that graph-based approaches (such as NaviX/ACORN) can incur prohibitive numbers of filter checks and system-level overheads, compared with clustering-based indexes such as ScaNN, often canceling out their theoretical benefits in real-world database environments. Ultimately, our findings provide the database community with crucial insights and practical guidelines, demonstrating that the optimal choice for a filter-agnostic FVS algorithm is not absolute, but rather a system-aware decision contingent on the interplay between workload characteristics and the underlying costs of data access in a real-world database architecture.
Problem

Research questions and friction points this paper is trying to address.

Filtered Vector Search
PostgreSQL
filter-agnostic
system-level overhead
vector search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Filter-Agnostic Vector Search
PostgreSQL
System-Level Overhead
Graph-Based Indexing
Clustering-Based Indexing
🔎 Similar Papers
No similar papers found.
Duo Lu
Duo Lu
Rider University
Robotics PerceptionIoTIntelligent Transportation SystemsIn-Air HandwritingCybersecurity
H
Helena Caminal
Google, USA
M
Manos Chatzakis
Université Paris Cité, LIPADE, France
Y
Yannis Papakonstantinou
Google, USA
Yannis Chronis
Yannis Chronis
ETH Zurich
DatabasesData ManagementHW-SW codesignHardware AccelerationDistributed Systems
V
Vaibhav Jain
Google, India
F
Fatma Özcan
Google, USA