MedCEG: Reinforcing Verifiable Medical Reasoning with Critical Evidence Graph

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical large language models lack rigorous clinical validation of their reasoning processes, hindering their trustworthy deployment in clinical decision support. To address this, we propose the Clinical Evidence Graph (CEG)—a verifiable and traceable structured reasoning-path modeling framework that explicitly constrains medical reasoning to evidence-based nodes and causal chains for the first time. Methodologically, we design a three-dimensional reward function measuring node coverage, structural correctness, and chain completeness, integrated with clinical-knowledge-guided Proximal Policy Optimization (PPO) reinforcement learning and an algorithmic CEG generation pipeline. Our approach achieves significant improvements over state-of-the-art methods across multiple medical reasoning benchmarks. Generated reasoning chains receive high clinical credibility scores from domain experts (mean 4.82/5.0). We publicly release our code, models, and a challenging case dataset to foster reproducible research and clinical evaluation.

Technology Category

Application Category

📝 Abstract
Large language models with reasoning capabilities have demonstrated impressive performance across a wide range of domains. In clinical applications, a transparent, step-by-step reasoning process provides physicians with strong evidence to support decision-making. While reinforcement learning has effectively enhanced reasoning performance in medical contexts, the clinical reliability of these reasoning processes remains limited because their accuracy and validity are often overlooked during training. To address this gap, we propose MedCEG, a framework that augments medical language models with clinically valid reasoning pathways by explicitly supervising the reasoning process through a Critical Evidence Graph (CEG). We curate a dataset of challenging clinical cases and algorithmically construct a CEG for each sample to represent a high-quality verifiable reasoning pathway. To guide the reasoning process, we introduce a Clinical Reasoning Procedure Reward, which evaluates Node Coverage, Structural Correctness, and Chain Completeness, thereby providing a holistic assessment of reasoning quality. Experimental results show that MedCEG surpasses existing methods in performance while producing clinically valid reasoning chains, representing a solid advancement in reliable medical AI reasoning. The code and models are available at https://github.com/LinjieMu/MedCEG.
Problem

Research questions and friction points this paper is trying to address.

Enhances medical AI reasoning with verifiable evidence graphs
Addresses limited clinical reliability in reinforcement learning models
Improves reasoning quality through structured clinical validation metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Critical Evidence Graph for verifiable reasoning pathways
Introduces Clinical Reasoning Procedure Reward for quality assessment
Enhances medical language models with clinically valid reasoning
🔎 Similar Papers
No similar papers found.