Efficient Vector Search in the Wild: One Model for Multi-K Queries

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing learning-based Top-K vector retrieval methods struggle to simultaneously achieve high accuracy, efficiency, and low preprocessing overhead when handling queries with variable K values. This work proposes OMEGA, the first approach to enable a single model to efficiently support arbitrary K-value retrieval. OMEGA trains a base model using trajectory features derived from K=1 queries and integrates a dynamic refinement mechanism with a statistics-driven model invocation strategy. Under the same preprocessing budget, OMEGA reduces average latency by 6–33% compared to state-of-the-art methods. Moreover, it achieves 1.01–1.28× the optimal baseline latency while requiring only 16–30% of the preprocessing time, effectively unifying the often conflicting demands of high accuracy, high performance, and low overhead across diverse K-value scenarios.

Technology Category

Application Category

📝 Abstract

Learned top-K search is a promising approach for serving vector queries with both high accuracy and performance. However, current models trained for a specific K value fail to generalize to real-world multi-K queries: they suffer from accuracy degradation (for larger Ks) and performance loss (for smaller Ks). Training the model to generalize on different Ks requires orders of magnitude more preprocessing time and is not suitable for serving vector queries in the wild. We present OMEGA, a K-generalizable learned top-K search method that simultaneously achieves high accuracy, high performance, and low preprocessing cost for multi-K vector queries. The key idea is that a base model properly trained on K=1 with our trajectory-based features can be used to accurately predict larger Ks with a dynamic refinement procedure and smaller Ks with minimal performance loss. To make our refinements efficient, we further leverage the statistical properties of top-K searches to reduce excessive model invocations. Extensive evaluations on multiple public and production datasets show that, under the same preprocessing budgets, OMEGA achieves 6-33% lower average latency compared to state-of-the-art learned search methods, while all systems achieve the same recall target. With only 16-30% of the preprocessing time, OMEGA attains 1.01-1.28x of the optimal average latency of these baselines.

Problem

Research questions and friction points this paper is trying to address.

vector search

top-K retrieval

multi-K queries

learned search

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

learned top-K search

K-generalizable

trajectory-based features