ProMed: Shapley Information Gain Guided Reinforcement Learning for Proactive Medical LLMs

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical large language models (LLMs) adopt a passive response paradigm in interactive diagnosis, often leading to misdiagnosis due to insufficient information gathering. Method: We propose an active medical LLM that autonomously poses high-informativeness clinical questions prior to diagnosis, guided by reinforcement learning. Our approach integrates Monte Carlo tree search, Shapley value computation, and a novel Shapley Information Gain (SIG) reward mechanism—jointly quantifying both question informativeness and contextual relevance—within a two-stage training framework. Contribution/Results: The SIG-driven reward design enables precise optimization of clinically informative questioning. Experiments on two partial-information medical benchmarks demonstrate that our model outperforms state-of-the-art methods by 6.29% on average and improves over passive baselines by 54.45%. Moreover, it exhibits strong out-of-domain generalization capability.

Technology Category

Application Category

📝 Abstract
Interactive medical questioning is essential in real-world clinical consultations, where physicians must actively gather information from patients. While medical Large Language Models (LLMs) have shown impressive capabilities in static medical question answering, they predominantly operate under a reactive paradigm: generating answers directly without seeking additional information, which risks incorrect diagnoses in such interactive settings. To address this limitation, we propose ProMed, a reinforcement learning (RL) framework that transitions medical LLMs toward a proactive paradigm, equipping them with the ability to ask clinically valuable questions before decision-making. At the core of ProMed is the Shapley Information Gain (SIG) reward, which quantifies the clinical utility of each question by combining the amount of newly acquired information with its contextual importance, estimated via Shapley values. We integrate SIG into a two-stage training pipeline: (1) SIG-Guided Model Initialization uses Monte Carlo Tree Search (MCTS) to construct high-reward interaction trajectories to supervise the model, and (2) SIG-Augmented Policy Optimization, which integrates SIG and enhances RL with a novel SIG-guided Reward Distribution Mechanism that assigns higher rewards to informative questions for targeted optimization. Extensive experiments on two newly curated partial-information medical benchmarks demonstrate that ProMed significantly outperforms state-of-the-art methods by an average of 6.29% and delivers a 54.45% gain over the reactive paradigm, while also generalizing robustly to out-of-domain cases.
Problem

Research questions and friction points this paper is trying to address.

Transitioning medical LLMs from reactive to proactive questioning paradigm
Quantifying clinical utility of questions via Shapley Information Gain
Addressing incorrect diagnosis risks in interactive medical consultations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley Information Gain reward for clinical utility
Two-stage training with SIG-guided initialization
SIG-augmented policy optimization with reward distribution
🔎 Similar Papers
No similar papers found.
H
Hongxin Ding
School of Computer Science, Peking University, Beijing, China
Baixiang Huang
Baixiang Huang
Emory University
Machine LearningNatural Language Processing
Y
Yue Fang
School of Computer Science, Peking University, Beijing, China
Weibin Liao
Weibin Liao
Peking University
Large Language ModelReinforcement LearningMedical Image Analysis
X
Xinke Jiang
School of Computer Science, Peking University, Beijing, China
Z
Zheng Li
School of Computer Science, Peking University, Beijing, China
Junfeng Zhao
Junfeng Zhao
Assistant Professor at Arizona State University, Director of BELIV Lab
Connected & Automated VehicleMotion Planning & ControlsElectric VehiclesAI/ML
Y
Yasha Wang
Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China; National Engineering Research Center For Software Engineering, Peking University, Beijing, China