Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In open-domain question answering, external evidence retrieval suffers from inflexible top-$k$ selection: fixed $k$ leads to either token waste or omission of critical information, while existing adaptive methods underperform on aggregative QA. This paper proposes Adaptive-$k$, a single-forward, fine-tuning-free, iteration-free, and LLM-call-free adaptive retrieval method that dynamically determines the optimal number of retrieved passages based on the statistical distribution of query-passage similarity scores. By adaptively thresholding the similarity distribution, Adaptive-$k$ achieves high-relevance recall (70%) and up to 10× token savings. It is agnostic to retrieval models and LLMs, and integrates seamlessly into RAG and long-context LLM (LCLM) pipelines. Extensive evaluation shows that Adaptive-$k$ consistently outperforms fixed-$k$ baselines across both factoid and aggregative QA benchmarks, significantly improving accuracy and the efficiency–effectiveness trade-off for five long-context LLMs and two embedding models.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) and long-context language models (LCLMs) both address context limitations of LLMs in open-domain question answering (QA). However, optimal external context to retrieve remains an open problem: fixing the retrieval size risks either wasting tokens or omitting key evidence. Existing adaptive methods like Self-RAG and Self-Route rely on iterative LLM prompting and perform well on factoid QA, but struggle with aggregation QA, where the optimal context size is both unknown and variable. We present Adaptive-$k$ retrieval, a simple and effective single-pass method that adaptively selects the number of passages based on the distribution of the similarity scores between the query and the candidate passages. It does not require model fine-tuning, extra LLM inferences or changes to existing retriever-reader pipelines. On both factoid and aggregation QA benchmarks, Adaptive-$k$ matches or outperforms fixed-$k$ baselines while using up to 10x fewer tokens than full-context input, yet still retrieves 70% of relevant passages. It improves accuracy across five LCLMs and two embedding models, highlighting that dynamically adjusting context size leads to more efficient and accurate QA.
Problem

Research questions and friction points this paper is trying to address.

Adaptively selects optimal context size for QA
Eliminates need for model tuning or iteration
Improves efficiency and accuracy in retrieval-augmented QA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive-$k retrieval for dynamic context selection
No tuning or extra LLM inferences needed
Single-pass method based on similarity scores
🔎 Similar Papers
No similar papers found.