ZeroSearch: Incentivize the Search Capability of LLMs without Searching

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenges of unreliable document quality and high API costs arising from reliance on real-world search engines in LLM retrieval training. We propose the first “search-free” reinforcement learning framework. Methodologically: (1) we introduce a progressive document quality degradation curriculum, synthesizing noisy data to foster robust retrieval and reasoning; (2) we integrate lightweight supervised fine-tuning, curriculum-based RL rollouts, and multi-scale retrieval modules—ensuring compatibility with mainstream RL algorithms and diverse LLM architectures and alignment states. Experiments demonstrate that a 3B-parameter model achieves effective retrieval capability; a 7B model matches real search engine performance; and a 14B model significantly surpasses it. Crucially, this paradigm eliminates dependence on external search APIs, enabling efficient, controllable, and low-cost training of LLM retrieval capabilities.

Technology Category

Application Category

📝 Abstract

Effective information searching is essential for enhancing the reasoning and generation capabilities of large language models (LLMs). Recent research has explored using reinforcement learning (RL) to improve LLMs' search capabilities by interacting with live search engines in real-world environments. While these approaches show promising results, they face two major challenges: (1) Uncontrolled Document Quality: The quality of documents returned by search engines is often unpredictable, introducing noise and instability into the training process. (2) Prohibitively High API Costs: RL training requires frequent rollouts, potentially involving hundreds of thousands of search requests, which incur substantial API expenses and severely constrain scalability. To address these challenges, we introduce ZeroSearch, a reinforcement learning framework that incentivizes the search capabilities of LLMs without interacting with real search engines. Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both relevant and noisy documents in response to a query. During RL training, we employ a curriculum-based rollout strategy that incrementally degrades the quality of generated documents, progressively eliciting the model's reasoning ability by exposing it to increasingly challenging retrieval scenarios. Extensive experiments demonstrate that ZeroSearch effectively incentivizes the search capabilities of LLMs using a 3B LLM as the retrieval module. Remarkably, a 7B retrieval module achieves comparable performance to the real search engine, while a 14B retrieval module even surpasses it. Furthermore, it generalizes well across both base and instruction-tuned models of various parameter sizes and is compatible with a wide range of RL algorithms.

Problem

Research questions and friction points this paper is trying to address.

Improve LLMs' search capability without live search engines

Address uncontrolled document quality in search results

Reduce high API costs from frequent search requests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning without live search engines

Employs curriculum-based rollout for document quality degradation

Achieves search performance surpassing real engines with 14B model

🔎 Similar Papers

Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

2024-10-03arXiv.orgCitations: 1

Authors to Follow