🤖 AI Summary
For complex queries involving multifaceted requirements and nuanced semantics, document relevance becomes highly context-dependent, rendering conventional re-ranking methods inadequate. To address this, we propose a “contextual relevance” modeling framework that formalizes relevance as the joint probability distribution over candidate document sets, revealing the substantial impact of document composition and ordering on large language model (LLM) relevance judgments. We introduce TS-SetRank—a novel algorithm integrating Bayesian uncertainty estimation with Thompson sampling—to enable uncertainty-aware, adaptive set-level sampling and re-ranking. Evaluated on BRIGHT and BEIR benchmarks, TS-SetRank achieves 15–25% and 6–21% improvements in nDCG@10, respectively, outperforming state-of-the-art retrieval and re-ranking approaches. Our core contributions are: (i) the first formal definition of contextual relevance, and (ii) the establishment of a set-level, uncertainty-driven re-ranking paradigm.
📝 Abstract
Reranking algorithms have made progress in improving document retrieval quality by efficiently aggregating relevance judgments generated by large language models (LLMs). However, identifying relevant documents for queries that require in-depth reasoning remains a major challenge. Reasoning-intensive queries often exhibit multifaceted information needs and nuanced interpretations, rendering document relevance inherently context dependent. To address this, we propose contextual relevance, which we define as the probability that a document is relevant to a given query, marginalized over the distribution of different reranking contexts it may appear in (i.e., the set of candidate documents it is ranked alongside and the order in which the documents are presented to a reranking model). While prior works have studied methods to mitigate the positional bias LLMs exhibit by accounting for the ordering of documents, we empirically find that the compositions of these batches also plays an important role in reranking performance. To efficiently estimate contextual relevance, we propose TS-SetRank, a sampling-based, uncertainty-aware reranking algorithm. Empirically, TS-SetRank improves nDCG@10 over retrieval and reranking baselines by 15-25% on BRIGHT and 6-21% on BEIR, highlighting the importance of modeling relevance as context-dependent.