🤖 AI Summary
To address critical challenges in medical retrieval-augmented generation (RAG)—including the decoupling of retrieval and reasoning, limited generalization from supervised fine-tuning, and domain-agnostic reward design—this paper introduces Med-R³, the first progressive reinforcement learning (RL) optimization framework tailored for healthcare. Med-R³ decouples and jointly optimizes reasoning and retrieval: it first strengthens logical reasoning capabilities, then adaptively refines knowledge retrieval to avoid path dependency. A multi-dimensional medical reward function is proposed, grounded in clinical consistency, evidential support, and diagnostic plausibility. Furthermore, Med-R³ enables dynamic interaction between large language models (LLMs) and external medical knowledge bases. Under identical parameter scales, LLaMA3.1-8B-Instruct integrated with Med-R³ outperforms GPT-4o-mini by 3.93%, while Qwen2.5-14B achieves a 13.53% improvement—demonstrating substantial gains in medical reasoning and evidence-based decision-making.
📝 Abstract
In medical scenarios, effectively retrieving external knowledge and leveraging it for rigorous logical reasoning is of significant importance. Despite their potential, existing work has predominantly focused on enhancing either retrieval or reasoning capabilities of the models in isolation, with little attention given to their joint optimization, which leads to limited coordination between the two processes. Additionally, current methods rely heavily on supervised fine-tuning (SFT), which can cause models to memorize existing problem-solving pathways, thereby restricting their generalization ability when confronted with novel problem contexts. Furthermore, while some studies have explored to improve retrieval-augmented reasoning in general domains via reinforcement learning, their reward function designs do not adequately capture the specific demands of the medical domain. To address these challenges, we introduce **Med-R$^3$**, a **Med**ical **R**etrieval-augmented **R**easoning framework driven by progressive **R**einforcement learning. In this framework, we first develop the model's ability to perform logical reasoning over medical problems. Subsequently, on the basis of this foundation, we adaptively optimize the retrieval capability to better align with the characteristics of knowledge corpus and external information utilization throughout the reasoning process. Finally, we conduct joint optimization of the model's retrieval and reasoning coordination. Extensive experiments indicate that **Med-R$^3$** could achieve state-of-the-art performances, with LLaMA3.1-8B-Instruct + Med-R$^3$ surpassing closed-sourced GPT-4o-mini by 3.93% at a comparable parameter scale, while Qwen2.5-14B augmented with Med-R$^3$ shows a more substantial gain of 13.53%.