🤖 AI Summary
This work addresses the instability and high computational cost in training reinforcement learning agents for deep research tasks, which stem from reliance on real-world search engines, as well as the inability of synthetic data to faithfully replicate real search dynamics. To overcome these limitations, the authors propose LiteResearcher, a framework that employs a lightweight simulated search environment to replace dependence on actual search engines. Integrated with a dynamic reward mechanism and a continual optimization strategy, LiteResearcher enables efficient and stable reinforcement learning. The approach drastically reduces training costs while allowing a compact 4B-parameter model to achieve state-of-the-art performance among open-source models, attaining 71.3% and 78.0% accuracy on the GAIA and Xbench benchmarks, respectively—surpassing both existing open-source and prominent commercial models.
📝 Abstract
Reinforcement Learning (RL) has emerged as a powerful training paradigm for LLM-based agents. However, scaling agentic RL for deep research remains constrained by two coupled challenges: hand-crafted synthetic data fails to elicit genuine real-world search capabilities, and real-world search dependency during RL training introduces instability and prohibitive cost, which limits the scalability of Agentic RL. LiteResearcher is a training framework that makes Agentic RL scalable: by constructing a lite virtual world that mirrors real-world search dynamics, we enable a continuously improving training recipe that empowers a tiny search agent to outperform large-scale open-source and commercial models (e.g., Tongyi DeepResearch and Claude-4.5 Sonnet). Specifically, on common benchmarks such as GAIA and Xbench, our LiteResearcher-4B achieves open-source state-of-the-art results of 71.3% and 78.0% respectively, demonstrating that scalable RL training is a key enabler for Deep Research Agents.