Think Before You Retrieve: Learning Test-Time Adaptive Search with Small Language Models

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current neural retrieval models exhibit limited reasoning capabilities, while large language models (LLMs) incur prohibitive computational costs; moreover, query rewriting approaches struggle to support the iterative exploration and dynamic revision required for complex queries. This paper proposes Orion, the first framework that synergistically integrates reinforcement learning (RL) with synthetic trajectory training to empower a lightweight (1.2B-parameter) language model to autonomously perform multi-step reasoning, self-reflection, and dynamic query optimization prior to retrieval. Orion unifies synthetic trajectory generation, supervised fine-tuning, RL-based policy optimization, and beam search during inference, enabling end-to-end learning of retrieval strategies. Experiments demonstrate that Orion outperforms state-of-the-art retrievers—despite their parameters being 200–400× larger—on five of six mainstream benchmarks: SciFact (+77.6%), BRIGHT (+25.2%), and NFCorpus (+63.2%), thereby challenging the “scale-only” paradigm in neural retrieval.

Technology Category

Application Category

📝 Abstract
Effective information retrieval requires reasoning over partial evidence and refining strategies as information emerges. Yet current approaches fall short: neural retrievers lack reasoning capabilities, large language models (LLMs) provide semantic depth but at prohibitive cost, and query rewriting or decomposition limits improvement to static transformations. As a result, existing methods fail to capture the iterative dynamics of exploration, feedback, and revision that complex user queries demand. We introduce Orion, a training framework that enables compact models (350M-1.2B parameters) to perform iterative retrieval through learned search strategies. Orion combines: (1) synthetic trajectory generation and supervised fine-tuning to encourage diverse exploration patterns in models, (2) reinforcement learning (RL) that rewards effective query refinement and backtracking behaviors, and (3) inference-time beam search algorithms that exploit the self-reflection capabilities learned during RL. Despite using only 3% of the training data available, our 1.2B model achieves 77.6% success on SciFact (vs. 72.6% for prior retrievers), 25.2% on BRIGHT (vs. 22.1%), 63.2% on NFCorpus (vs. 57.8%), and remains competitive on FEVER, HotpotQA, and MSMarco. It outperforms retrievers up to 200-400x larger on five of six benchmarks. These findings suggest that retrieval performance can emerge from learned strategies, not just model scale, when models are trained to search, reflect, and revise.
Problem

Research questions and friction points this paper is trying to address.

Neural retrievers lack reasoning for complex queries
Large language models are too costly for retrieval tasks
Current methods fail to capture iterative exploration dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains compact models via synthetic trajectory generation
Uses reinforcement learning for query refinement behaviors
Implements inference-time beam search with self-reflection capabilities
🔎 Similar Papers
No similar papers found.