MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Current medical large language models lack standardized, verifiable, and interpretable benchmarks for evaluating clinical reasoning capabilities. To address this gap, we introduce MedReason—the first large-scale, high-precision medical reasoning dataset comprising 32,682 question-answer pairs annotated with fine-grained, stepwise reasoning traces. MedReason pioneers a structured knowledge-graph–driven “chain-of-thought” generation paradigm, mapping clinical questions to evidence-based, logically grounded inference paths aligned with循证医学 principles. Clinical validity is ensured through multi-specialty physician collaboration in annotation and evaluation. Supervised fine-tuning on MedReason improves DeepSeek-Distill-8B’s reasoning accuracy by 7.7%. Our custom MedReason-8B model outperforms Huatuo-o1-8B by 4.2% on the MedBullets benchmark, demonstrating substantial gains in faithfulness and interpretability of clinical decision-making.

Technology Category

Application Category

📝 Abstract

Medical tasks such as diagnosis and treatment planning require precise and complex reasoning, particularly in life-critical domains. Unlike mathematical reasoning, medical reasoning demands meticulous, verifiable thought processes to ensure reliability and accuracy. However, there is a notable lack of datasets that provide transparent, step-by-step reasoning to validate and enhance the medical reasoning ability of AI models. To bridge this gap, we introduce MedReason, a large-scale high-quality medical reasoning dataset designed to enable faithful and explainable medical problem-solving in large language models (LLMs). We utilize a structured medical knowledge graph (KG) to convert clinical QA pairs into logical chains of reasoning, or ``thinking paths'', which trace connections from question elements to answers via relevant KG entities. Each path is validated for consistency with clinical logic and evidence-based medicine. Our pipeline generates detailed reasoning for various medical questions from 7 medical datasets, resulting in a dataset of 32,682 question-answer pairs, each with detailed, step-by-step explanations. Experiments demonstrate that fine-tuning with our dataset consistently boosts medical problem-solving capabilities, achieving significant gains of up to 7.7% for DeepSeek-Ditill-8B. Our top-performing model, MedReason-8B, outperforms the Huatuo-o1-8B, a state-of-the-art medical reasoning model, by up to 4.2% on the clinical benchmark MedBullets. We also engage medical professionals from diverse specialties to assess our dataset's quality, ensuring MedReason offers accurate and coherent medical reasoning. Our data, models, and code will be publicly available.

Problem

Research questions and friction points this paper is trying to address.

Lack of datasets for transparent medical reasoning in AI

Need for verifiable step-by-step medical reasoning paths

Enhancing LLMs' medical problem-solving with structured knowledge graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes knowledge graphs for medical reasoning steps

Generates detailed step-by-step clinical explanations

Enhances LLMs' medical accuracy via fine-tuning

🔎 Similar Papers

Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval