RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism

πŸ“… 2025-06-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) suffer from hallucinations and outdated responses due to static, frozen knowledge. Existing retrieval-augmented generation (RAG) approaches face bottlenecks in training stability, inference latency, and single-query constraints. This paper proposes a multi-query parallel RAG framework integrated with a reinforcement learning (RL)-driven dynamic retrieval-reasoning co-optimization mechanism, breaking away from conventional sequential, single-query paradigms to enable efficient synergy between external and internal parametric knowledge. Key innovations include: (1) parallel multi-query generation and joint retrieval, substantially reducing retrieval latency; and (2) an end-to-end RL policy network that jointly optimizes retrieval intent and reasoning paths. Evaluated on seven QA benchmarks, our method achieves a 13.2% accuracy gain over the strongest baseline while reducing inference time by 11.1%, demonstrating superior performance-efficiency trade-offs.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, while they remain prone to generating hallucinated or outdated responses due to their static internal knowledge. Recent advancements in Retrieval-Augmented Generation (RAG) methods have explored enhancing models' search and reasoning capabilities through reinforcement learning (RL). Although these methods demonstrate promising results, they face challenges in training stability and encounter issues such as substantial inference time and restricted capabilities due to the single-query mode. In this paper, we propose RAG-R1, a novel training framework designed to enable LLMs to adaptively leverage internal and external knowledge during the reasoning process. We further expand the generation and retrieval processes within the framework from single-query mode to multi-query parallelism, aimed at reducing inference time and enhancing the model's capabilities. Extensive experiments on seven question-answering benchmarks demonstrate that our method outperforms the strongest baseline by up to 13.2% and decreases inference time by 11.1%.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' search and reasoning with multi-query parallelism
Reduce inference time and improve model capabilities
Address hallucination and outdated responses in LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-query parallelism enhances search and reasoning
Adaptive use of internal and external knowledge
Reduces inference time by 11.1%
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhiwen Tan
AWorld Team, Inclusion AI
J
Jiaming Huang
AWorld Team, Inclusion AI
Qintong Wu
Qintong Wu
Ant Group
Machine LearningDeep Learning
H
Hongxuan Zhang
AWorld Team, Inclusion AI
Chenyi Zhuang
Chenyi Zhuang
AIST, AIRC
machine learning
Jinjie Gu
Jinjie Gu
ant group
ζœΊε™¨ε­¦δΉ οΌŒζŽ¨θ