Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Current neural retrieval models exhibit limited reasoning capabilities, while large language models (LLMs) incur prohibitive computational costs; moreover, query rewriting approaches struggle to support the iterative exploration and dynamic revision required for complex queries. This paper proposes Orion, the first framework that synergistically integrates reinforcement learning (RL) with synthetic trajectory training to empower a lightweight (1.2B-parameter) language model to autonomously perform multi-step reasoning, self-reflection, and dynamic query optimization prior to retrieval. Orion unifies synthetic trajectory generation, supervised fine-tuning, RL-based policy optimization, and beam search during inference, enabling end-to-end learning of retrieval strategies. Experiments demonstrate that Orion outperforms state-of-the-art retrievers—despite their parameters being 200–400× larger—on five of six mainstream benchmarks: SciFact (+77.6%), BRIGHT (+25.2%), and NFCorpus (+63.2%), thereby challenging the “scale-only” paradigm in neural retrieval.

Technology Category

Application Category

📝 Abstract

Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning capabilities, large language models (LLMs) provide semantic depth but at prohibitive cost, and query rewriting or decomposition limits improvement to static transformations. As a result, existing methods fail to capture the iterative dynamics of exploration, feedback, and revision that complex user queries demand. We introduce Orion, a training framework that enables compact models (350M-1.2B parameters) to perform iterative retrieval through learned search strategies. Orion combines: (1) synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models, (2) reinforcement learning (RL) that rewards effective query refinement and backtracking behaviors, and (3) inference-time beam search algorithms that exploit the self-reflection capabilities learned during RL. Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact (vs. 72.6% for prior retrievers), 25.2% on BRIGHT (vs. 22.1%), 63.2% on NFCorpus (vs. 57.8%), and remains competitive on FEVER, HotpotQA, and MSMarco. It outperforms retrievers up to 200-400x larger on five of six benchmarks. These findings suggest that retrieval performance can emerge from learned strategies, not just model scale, when models are trained to search, reflect, and revise.

Problem

Research questions and friction points this paper is trying to address.

Neural retrievers lack reasoning for complex queries

Large language models are too costly for retrieval tasks

Current methods fail to capture iterative exploration dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains compact models via synthetic trajectory generation

Uses reinforcement learning for query refinement behaviors

Implements inference-time beam search with self-reflection capabilities

🔎 Similar Papers

Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs