🤖 AI Summary
Existing embedding-based retrieval systems employ fixed recall budgets, leading to insufficient recall for head queries and degraded precision for tail queries—rooted in the frequentist paradigm of loss functions. This paper proposes a query-aware probabilistic retrieval framework that abandons fixed truncation and instead models the query-specific distribution of candidate cosine similarities. By learning a query-conditioned cumulative distribution function (CDF), the framework dynamically determines similarity thresholds per query. For the first time, it jointly optimizes precision and recall for both head and tail queries within a unified probabilistic framework. Experiments across multiple industrial retrieval benchmarks demonstrate significant improvements in overall effectiveness. Ablation studies validate the efficacy of explicitly modeling head–tail disparities. The core innovation lies in elevating threshold selection from heuristic, static configuration to a principled, query-adaptive, probabilistic decision process.
📝 Abstract
Embedding retrieval aims to learn a shared semantic representation space for both queries and items, thus enabling efficient and effective item retrieval using approximate nearest neighbor (ANN) algorithms. In current industrial practice, retrieval systems typically retrieve a fixed number of items for different queries, which actually leads to insufficient retrieval (low recall) for head queries and irrelevant retrieval (low precision) for tail queries. Mostly due to the trend of frequentist approach to loss function designs, till now there is no satisfactory solution to holistically address this challenge in the industry. In this paper, we move away from the frequentist approach, and take a novel extbf{p}robabilistic approach to extbf{e}mbedding extbf{b}ased extbf{r}etrieval (namely extbf{pEBR}) by learning the item distribution for different queries, which enables a dynamic cosine similarity threshold calculated by the probabilistic cumulative distribution function (CDF) value. The experimental results show that our approach improves both the retrieval precision and recall significantly. Ablation studies also illustrate how the probabilistic approach is able to capture the differences between head and tail queries.