Med-R$^3$: Enhancing Medical Retrieval-Augmented Reasoning of LLMs via Progressive Reinforcement Learning

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in medical retrieval-augmented generation (RAG)—including the decoupling of retrieval and reasoning, limited generalization from supervised fine-tuning, and domain-agnostic reward design—this paper introduces Med-R³, the first progressive reinforcement learning (RL) optimization framework tailored for healthcare. Med-R³ decouples and jointly optimizes reasoning and retrieval: it first strengthens logical reasoning capabilities, then adaptively refines knowledge retrieval to avoid path dependency. A multi-dimensional medical reward function is proposed, grounded in clinical consistency, evidential support, and diagnostic plausibility. Furthermore, Med-R³ enables dynamic interaction between large language models (LLMs) and external medical knowledge bases. Under identical parameter scales, LLaMA3.1-8B-Instruct integrated with Med-R³ outperforms GPT-4o-mini by 3.93%, while Qwen2.5-14B achieves a 13.53% improvement—demonstrating substantial gains in medical reasoning and evidence-based decision-making.

Technology Category

Application Category

📝 Abstract
In medical scenarios, effectively retrieving external knowledge and leveraging it for rigorous logical reasoning is of significant importance. Despite their potential, existing work has predominantly focused on enhancing either retrieval or reasoning capabilities of the models in isolation, with little attention given to their joint optimization, which leads to limited coordination between the two processes. Additionally, current methods rely heavily on supervised fine-tuning (SFT), which can cause models to memorize existing problem-solving pathways, thereby restricting their generalization ability when confronted with novel problem contexts. Furthermore, while some studies have explored to improve retrieval-augmented reasoning in general domains via reinforcement learning, their reward function designs do not adequately capture the specific demands of the medical domain. To address these challenges, we introduce **Med-R$^3$**, a **Med**ical **R**etrieval-augmented **R**easoning framework driven by progressive **R**einforcement learning. In this framework, we first develop the model's ability to perform logical reasoning over medical problems. Subsequently, on the basis of this foundation, we adaptively optimize the retrieval capability to better align with the characteristics of knowledge corpus and external information utilization throughout the reasoning process. Finally, we conduct joint optimization of the model's retrieval and reasoning coordination. Extensive experiments indicate that **Med-R$^3$** could achieve state-of-the-art performances, with LLaMA3.1-8B-Instruct + Med-R$^3$ surpassing closed-sourced GPT-4o-mini by 3.93% at a comparable parameter scale, while Qwen2.5-14B augmented with Med-R$^3$ shows a more substantial gain of 13.53%.
Problem

Research questions and friction points this paper is trying to address.

Joint optimization of retrieval and reasoning in medical LLMs
Reducing over-reliance on supervised fine-tuning for generalization
Adapting reinforcement learning for medical domain-specific rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive reinforcement learning for medical reasoning
Joint optimization of retrieval and reasoning
Domain-specific reward function design
🔎 Similar Papers
No similar papers found.
K
Keer Lu
Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University
Z
Zheng Liang
Baichuan Inc.
Y
Youquan Li
Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University
Jiejun Tan
Jiejun Tan
PhD Student, Renmin University of China
Natural Language ProcessingInformation Retrieval
D
Da Pan
Baichuan Inc.
S
Shusen Zhang
Baichuan Inc.
G
Guosheng Dong
Baichuan Inc.
H
Huang Leng
School of CS & Key Lab of High Confidence Software Technologies (MOE), Peking University