R-Search: Empowering LLM Reasoning with Search via Multi-Reward Reinforcement Learning

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Large language models (LLMs) struggle to autonomously plan reasoning and search interaction trajectories for complex logical and knowledge-intensive tasks. Method: This paper proposes the first end-to-end reinforcement learning framework that jointly optimizes reasoning–search coordination paths via multi-reward optimization. It integrates dynamic retrieval triggering, multi-stage policy gradient optimization, and global evidence fusion to enable joint decision-making between reasoning steps and search actions, as well as coherent integration of retrieved evidence. Contribution/Results: Evaluated on seven benchmark datasets, our method significantly outperforms state-of-the-art RAG baselines, achieving up to 32.2% and 25.1% improvements in in-domain and out-of-domain performance, respectively. It is the first approach to realize global, joint optimization of reasoning–search interaction paths, establishing a novel paradigm for LLM-augmented reasoning.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have notably progressed in multi-step and long-chain reasoning. However, extending their reasoning capabilities to encompass deep interactions with search remains a non-trivial challenge, as models often fail to identify optimal reasoning-search interaction trajectories, resulting in suboptimal responses. We propose R-Search, a novel reinforcement learning framework for Reasoning-Search integration, designed to enable LLMs to autonomously execute multi-step reasoning with deep search interaction, and learn optimal reasoning search interaction trajectories via multi-reward signals, improving response quality in complex logic- and knowledge-intensive tasks. R-Search guides the LLM to dynamically decide when to retrieve or reason, while globally integrating key evidence to enhance deep knowledge interaction between reasoning and search. During RL training, R-Search provides multi-stage, multi-type rewards to jointly optimize the reasoning-search trajectory. Experiments on seven datasets show that R-Search outperforms advanced RAG baselines by up to 32.2% (in-domain) and 25.1% (out-of-domain). The code and data are available at https://github.com/QingFei1/R-Search.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning with deep search interaction

Optimizing reasoning-search trajectories via multi-reward RL

Improving response quality in logic-intensive tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-reward reinforcement learning for reasoning-search integration

Dynamic decision-making between retrieval and reasoning

Multi-stage rewards optimize reasoning-search trajectories

🔎 Similar Papers

No similar papers found.