Retrieval, Reward, and Training Protocols: What Matters in Training Search Agents?

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing approaches to training search agents lack systematic comparisons across retrieval corpora, reward design, and training protocols, making it difficult to identify the key factors driving performance improvements. This work conducts controlled experiments using multiple base large language models to systematically evaluate the impact of corpus coverage, reward mechanisms (outcome-based versus process-based), and training strategies. The study finds that the coverage limitations of the Wikipedia 2018 corpus significantly constrain agent performance, and correcting these deficiencies yields gains that surpass differences between various training algorithms. Furthermore, outcome-based rewards consistently outperform process-based ones across most settings. Building on these insights, the paper distills a set of efficient and reproducible training guidelines and provides open-source code alongside practical implementation recommendations.

📝 Abstract

Search agents powered by large language models can autonomously decompose queries, retrieve information, and synthesize answers through multi-step reasoning. However, the rapid growth of training methods has outpaced controlled comparison: existing works differ in retrieval corpora, reward designs, and training protocols, making it unclear what actually drives improvements. We present a controlled empirical study that isolates three under-explored dimensions of search agent training. First, we identify a critical data-coverage issue in the widely used Wikipedia 2018 corpus and show that correcting it alone yields larger gains than the differences between training algorithms. Second, we systematically compare outcome-based and process-based reward methods across three base models, finding that the simplest outcome-based approach achieves competitive or superior performance in most settings, and that process-level credit assignment can over-correct agent behavior. Third, we analyze training data diversity, off-policy data utilization, and search budget scaling, distilling practical guidelines for training effective search agents. Our code is available at https://github.com/YiboZhao624/SearchAgentReview.

Problem

Research questions and friction points this paper is trying to address.

search agents

retrieval corpus

reward design

training protocols

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

search agents

retrieval corpus

reward design