MedGuideX: Internalizing Decision Logic from Executable Guidelines into Large Language Models for Clinical Reasoning

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge that current clinical practice guidelines (CPGs), typically represented as free-text documents, are ill-suited for explicitly modeling their underlying decision logic in language model training or retrieval. To overcome this limitation, the study introduces a novel approach that first converts CPGs into executable, programmatic decision structures and then leverages these to generate factual and counterfactual question-answer pairs, thereby constructing structured supervision signals for fine-tuning large medical language models. This enables the models to internalize guideline-driven clinical reasoning rather than merely memorizing surface-level text. Experimental results demonstrate an average relative accuracy improvement of 10.28% across four clinical reasoning benchmarks. Furthermore, clinician evaluations confirm that the model’s generated explanations exhibit significantly higher fidelity, validity, completeness, and clarity compared to baseline methods.

📝 Abstract

Clinical practice guidelines (CPGs) encode evidence-based decision logic that clinicians apply by evaluating patient variables, conditional criteria, and recommendation rules. However, existing methods often use CPGs as free-text training data or retrieval sources, underutilizing their procedural decision structure. To better exploit this structure, we introduce a guideline-derived training pipeline that transforms CPG recommendations into executable clinical decision logic and uses it to generate factual and counterfactual question-answering data. Theses data teach models both guideline-supported decisions and how decisions change under different patient conditions. Post-training a medical LLM on the generated data yields MedGuideX. Across four clinical reasoning benchmarks, MedGuideX achieves a 10.28% relative improvement in average accuracy. Physician evaluation further shows that MedGuideX better recovers clinician authored reasoning steps and produces physician-preferred rationales in faithfulness, validity, completeness, and clarity. Overall, our results show that executable decision logic from CPGs can be transformed into scalable supervision for building reliable medical LLMs.

Problem

Research questions and friction points this paper is trying to address.

clinical practice guidelines

decision logic

large language models

clinical reasoning

executable guidelines

Innovation

Methods, ideas, or system contributions that make the work stand out.

executable decision logic

clinical practice guidelines

counterfactual reasoning