Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) achieve strong performance on medical benchmark datasets but lack strategic diagnostic questioning and empathetic communication capabilities in real-world clinical settings. To address this, we propose an experience-driven multi-agent reinforcement learning framework that— for the first time—decouples and jointly optimizes clinical decision accuracy and empathetic dialogue proficiency. Our method establishes a multi-agent interactive environment with a dual-layer reward mechanism (clinical correctness + conversational quality) and integrates an experience replay buffer to enhance policy learning. The agent is trained on high-quality, expert-annotated clinical dialogue trajectories, combining LLMs, multi-agent systems, and experience replay techniques. Experiments demonstrate that our AI physician significantly outperforms leading open-source domain-specific models and multiple closed-source foundation models on HealthBench and MAQuE, achieving superior parameter efficiency. Human evaluation by clinical experts further confirms strong preference for its multi-turn, empathetic diagnostic dialogues.

Technology Category

Application Category

📝 Abstract
The professionalism of a human doctor in outpatient service depends on two core abilities: the ability to make accurate medical decisions and the medical consultation skill to conduct strategic, empathetic patient inquiry. Existing Large Language Models (LLMs) have achieved remarkable accuracy on medical decision-making benchmarks. However, they often lack the ability to conduct the strategic and empathetic consultation, which is essential for real-world clinical scenarios. To address this gap, we propose Doctor-R1, an AI doctor agent trained to master both of the capabilities by ask high-yield questions and conduct strategic multi-turn inquiry to guide decision-making. Our framework introduces three key components: a multi-agent interactive environment, a two-tiered reward architecture that separately optimizes clinical decision-making and communicative inquiry skills, and an experience repository to ground policy learning in high-quality prior trajectories. We evaluate Doctor-R1 on OpenAI's HealthBench and MAQuE, assessed across multi-facet metrics, such as communication quality, user experience, and task accuracy. Remarkably, Doctor-R1 surpasses state-of-the-art open-source specialized LLMs by a substantial margin with higher parameter efficiency and outperforms powerful proprietary models. Furthermore, the human evaluations show a strong preference for Doctor-R1 to generate human-preferred clinical dialogue, demonstrating the effectiveness of the framework.
Problem

Research questions and friction points this paper is trying to address.

Enhancing AI's strategic and empathetic patient inquiry skills
Bridging the gap between medical decision accuracy and consultation quality
Developing agentic reinforcement learning for clinical dialogue optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent interactive environment for clinical training
Two-tiered reward system optimizing decision and communication
Experience repository grounding policy in prior trajectories
🔎 Similar Papers
Yunghwei Lai
Yunghwei Lai
Institute for AI Industry Research, Tsinghua University
LLM Agent | AI Healthcare
Kaiming Liu
Kaiming Liu
Tsinghua University
LLMAutonomous Agent
Z
Ziyue Wang
Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University
Weizhi Ma
Weizhi Ma
Tsinghua University
LLM and AgentsRecommendationAI for Healthcare
Y
Yang Liu
Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University, Institute for AI Industry Research (AIR), Tsinghua University