ZeroSearch: Incentivize the Search Capability of LLMs without Searching

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of unreliable document quality and high API costs arising from reliance on real-world search engines in LLM retrieval training. We propose the first “search-free” reinforcement learning framework. Methodologically: (1) we introduce a progressive document quality degradation curriculum, synthesizing noisy data to foster robust retrieval and reasoning; (2) we integrate lightweight supervised fine-tuning, curriculum-based RL rollouts, and multi-scale retrieval modules—ensuring compatibility with mainstream RL algorithms and diverse LLM architectures and alignment states. Experiments demonstrate that a 3B-parameter model achieves effective retrieval capability; a 7B model matches real search engine performance; and a 14B model significantly surpasses it. Crucially, this paradigm eliminates dependence on external search APIs, enabling efficient, controllable, and low-cost training of LLM retrieval capabilities.

Technology Category

Application Category

📝 Abstract
Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a reinforcement learning framework that incentivizes the search capabilities of LLMs without interacting with real search engines. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.
Problem

Research questions and friction points this paper is trying to address.

Improve LLMs' search capability without live search engines
Address uncontrolled document quality in search results
Reduce high API costs from frequent search requests
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning without live search engines
Employs curriculum-based rollout for document quality degradation
Achieves search performance surpassing real engines with 14B model
🔎 Similar Papers
H
Hao Sun
Tongyi Lab, Alibaba Group
Zile Qiao
Zile Qiao
Alibaba Tongyi Lab; Peking University
Jiayan Guo
Jiayan Guo
Alibaba DAMO Academy, Peking University
LLMMLLMEmbodied AIAgentsRecommender System
X
Xuanbo Fan
Tongyi Lab, Alibaba Group
Y
Yingyan Hou
Tongyi Lab, Alibaba Group
Y
Yong Jiang
Tongyi Lab, Alibaba Group
Pengjun Xie
Pengjun Xie
Alibaba Group
NLP/IR/ML
F
Fei Huang
Tongyi Lab, Alibaba Group
Y
Yan Zhang
Tongyi Lab, Alibaba Group