R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

๐Ÿ“… 2025-03-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large reasoning models (LRMs) rely heavily on internal knowledge, leading to hallucinations in time-sensitive and knowledge-intensive tasks. To address this, we propose R1-Searcherโ€”a two-stage, purely outcome-oriented reinforcement learning framework based on Proximal Policy Optimization (PPO)โ€”that enables LLMs to autonomously invoke external search APIs during inference to acquire real-time, accurate information. Crucially, our method requires no process-based reward signals, no knowledge distillation for cold-start initialization, and no human annotations. It is compatible with both base and instruction-tuned LLMs and exhibits strong cross-domain generalization. Extensive experiments demonstrate that R1-Searcher significantly outperforms mainstream retrieval-augmented generation (RAG) approaches and even surpasses the closed-source GPT-4o-mini on mathematical and code-generation benchmarks. Moreover, it substantially reduces hallucination rates while improving factual accuracy and reasoning robustness.

Technology Category

Application Category

๐Ÿ“ Abstract
Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models~(LLMs). While they achieve remarkable performance on challenging tasks such as mathematics and coding, they often rely on their internal knowledge to solve problems, which can be inadequate for time-sensitive or knowledge-intensive questions, leading to inaccuracies and hallucinations. To address this, we propose extbf{R1-Searcher}, a novel two-stage outcome-based RL approach designed to enhance the search capabilities of LLMs. This method allows LLMs to autonomously invoke external search systems to access additional knowledge during the reasoning process. Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start. % effectively generalizing to out-of-domain datasets and supporting both Base and Instruct models. Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' search capabilities via reinforcement learning.
Address inaccuracies in time-sensitive or knowledge-intensive tasks.
Enable LLMs to autonomously access external knowledge systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage RL enhances LLM search capabilities
Autonomous external search invocation for knowledge
Outperforms RAG methods without process rewards
๐Ÿ”Ž Similar Papers
No similar papers found.