A Comprehensive Survey of Electronic Health Record Modeling: From Deep Learning Approaches to Large Language Models

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Electronic health record (EHR) data exhibit heterogeneity, irregular temporal sampling, and strong domain dependency—distinct from computer vision or natural language processing tasks—posing significant challenges for effective AI-driven clinical decision support. This paper presents a systematic survey of deep learning and large language model (LLM) approaches for EHR modeling. We propose, for the first time, a unified five-dimensional taxonomy covering data representation, architectural design, learning paradigms, multimodal fusion, and LLM-specific applications. We innovatively integrate classical deep learning with LLMs, formally defining three emerging directions: EHR foundation models, clinical agents, and EHR-to-text reasoning. Our framework enhances temporal modeling accuracy, cross-institutional generalizability, and interpretability via structured sequential representation learning, joint modeling with clinical knowledge graphs, and synergistic self-supervised, knowledge-enhanced, and multimodal fine-tuning strategies. All methods are open-sourced at https://survey-on-tabular-data.github.io/.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) has demonstrated significant potential in transforming healthcare through the analysis and modeling of electronic health records (EHRs). However, the inherent heterogeneity, temporal irregularity, and domain-specific nature of EHR data present unique challenges that differ fundamentally from those in vision and natural language tasks. This survey offers a comprehensive overview of recent advancements at the intersection of deep learning, large language models (LLMs), and EHR modeling. We introduce a unified taxonomy that spans five key design dimensions: data-centric approaches, neural architecture design, learning-focused strategies, multimodal learning, and LLM-based modeling systems. Within each dimension, we review representative methods addressing data quality enhancement, structural and temporal representation, self-supervised learning, and integration with clinical knowledge. We further highlight emerging trends such as foundation models, LLM-driven clinical agents, and EHR-to-text translation for downstream reasoning. Finally, we discuss open challenges in benchmarking, explainability, clinical alignment, and generalization across diverse clinical settings. This survey aims to provide a structured roadmap for advancing AI-driven EHR modeling and clinical decision support. For a comprehensive list of EHR-related methods, kindly refer to https://survey-on-tabular-data.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Addressing EHR data heterogeneity and temporal irregularity challenges
Surveying deep learning and LLM advancements in EHR modeling
Exploring clinical knowledge integration and self-supervised learning methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning for EHR data analysis
Large language models in healthcare
Multimodal learning with clinical knowledge
🔎 Similar Papers
No similar papers found.
Weijieying Ren
Weijieying Ren
Stanford University
Artificial IntelligenceHealthcareNatural Language Processing
J
Jingxi Zhu
Information Sciences and Technology, The Pennsylvania State University, USA
Z
Zehao Liu
Information Sciences and Technology, The Pennsylvania State University, USA
Tianxiang Zhao
Tianxiang Zhao
the Pennsylvania State University
V
Vasant Honavar
Information Sciences and Technology, The Pennsylvania State University, USA