R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to autonomously plan reasoning and search interaction trajectories for complex logical and knowledge-intensive tasks. Method: This paper proposes the first end-to-end reinforcement learning framework that jointly optimizes reasoning–search coordination paths via multi-reward optimization. It integrates dynamic retrieval triggering, multi-stage policy gradient optimization, and global evidence fusion to enable joint decision-making between reasoning steps and search actions, as well as coherent integration of retrieved evidence. Contribution/Results: Evaluated on seven benchmark datasets, our method significantly outperforms state-of-the-art RAG baselines, achieving up to 32.2% and 25.1% improvements in in-domain and out-of-domain performance, respectively. It is the first approach to realize global, joint optimization of reasoning–search interaction paths, establishing a novel paradigm for LLM-augmented reasoning.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-stage, multi-type rewards to jointly optimize the reasoning-search trajectory. Experiments on seven datasets show that R-Search outperforms advanced RAG baselines by up to 32.2% (in-domain) and 25.1% (out-of-domain). The code and data are available at https://github.com/QingFei1/R-Search.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning with deep search interaction
Optimizing reasoning-search trajectories via multi-reward RL
Improving response quality in logic-intensive tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-reward reinforcement learning for reasoning-search integration
Dynamic decision-making between retrieval and reasoning
Multi-stage rewards optimize reasoning-search trajectories
🔎 Similar Papers
No similar papers found.
Qingfei Zhao
Qingfei Zhao
University of the Chinese Academy of Sciences
Natural Language ProcessingArtificial Intelligence
Ruobing Wang
Ruobing Wang
University of Chinese Academy of Sciences
LLMRAG
Dingling Xu
Dingling Xu
Student of CS, Beijing Normal University
NLPLLM
D
Daren Zha
Institute of Information Engineering, Chinese Academy of Sciences
L
Limin Liu
Institute of Information Engineering, Chinese Academy of Sciences