EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Current large language models (LLMs) face two critical bottlenecks in electronic health record (EHR) analysis: narrow task coverage and weak clinical reasoning capabilities. To address these, we propose EHR-R1—the first reasoning-enhanced LLM family specifically designed for EHRs. Our method introduces (i) EHR-Ins, the first large-scale EHR reasoning instruction dataset, constructed via a novel thought-graph-driven data generation framework; (ii) EHR-Bench, a comprehensive benchmark covering 42 clinically relevant tasks; and (iii) a three-stage training paradigm—domain adaptation, reasoning enhancement, and reinforcement learning. Evaluated on MIMIC-Bench, the 72B-parameter EHR-R1 surpasses GPT-4o by over 30 points; on EHRSHOT, it achieves a 10% zero-shot AUROC improvement. It significantly outperforms both leading open-source and commercial models. EHR-R1 establishes a new paradigm for trustworthy, interpretable clinical AI.

Technology Category

Application Category

📝 Abstract

Electronic Health Records (EHRs) contain rich yet complex information, and their automated analysis is critical for clinical decision-making. Despite recent advances of large language models (LLMs) in clinical workflows, their ability to analyze EHRs remains limited due to narrow task coverage and lack of EHR-oriented reasoning capabilities. This paper aims to bridge the gap, specifically, we present EHR-Ins, a large-scale, comprehensive EHR reasoning instruction dataset, comprising 300k high-quality reasoning cases and 4M non-reasoning cases across 42 distinct EHR tasks. Its core innovation is a thinking-graph-driven framework that enables to generate high-quality reasoning data at scale. Based on it, we develop EHR-R1, a series of reasoning-enhanced LLMs with up to 72B parameters tailored for EHR analysis. Through a multi-stage training paradigm, including domain adaptation, reasoning enhancement, and reinforcement learning, EHR-R1 systematically acquires domain knowledge and diverse reasoning capabilities, enabling accurate and robust EHR analysis. Lastly, we introduce EHR-Bench, a new benchmark curated from MIMIC-IV, spanning 42 tasks, to comprehensively assess reasoning and prediction across EHR scenarios. In experiments, we show that the resulting EHR-R1 consistently outperforms state-of-the-art commercial and open-source LLMs (including DeepSeek-V3 and GPT-4o), surpassing GPT-4o by over 30 points on MIMIC-Bench and achieving a 10% higher zero-shot AUROC on EHRSHOT. Collectively, EHR-Ins, EHR-R1, and EHR-Bench have significantly advanced the development for more reliable and clinically relevant EHR analysis.

Problem

Research questions and friction points this paper is trying to address.

Enhancing EHR analysis with reasoning capabilities for clinical decisions

Addressing limited task coverage in current clinical language models

Developing specialized models for complex electronic health record interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Thinking-graph framework generates large-scale EHR reasoning data

Multi-stage training enhances domain knowledge and reasoning capabilities

EHR-R1 model outperforms state-of-the-art LLMs on clinical benchmarks

🔎 Similar Papers

EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation