🤖 AI Summary
Electronic health record (EHR) data exhibit heterogeneity, irregular temporal sampling, and strong domain dependency—distinct from computer vision or natural language processing tasks—posing significant challenges for effective AI-driven clinical decision support. This paper presents a systematic survey of deep learning and large language model (LLM) approaches for EHR modeling. We propose, for the first time, a unified five-dimensional taxonomy covering data representation, architectural design, learning paradigms, multimodal fusion, and LLM-specific applications. We innovatively integrate classical deep learning with LLMs, formally defining three emerging directions: EHR foundation models, clinical agents, and EHR-to-text reasoning. Our framework enhances temporal modeling accuracy, cross-institutional generalizability, and interpretability via structured sequential representation learning, joint modeling with clinical knowledge graphs, and synergistic self-supervised, knowledge-enhanced, and multimodal fine-tuning strategies. All methods are open-sourced at https://survey-on-tabular-data.github.io/.
📝 Abstract
Artificial intelligence (AI) has demonstrated significant potential in transforming healthcare through the analysis and modeling of electronic health records (EHRs). However, the inherent heterogeneity, temporal irregularity, and domain-specific nature of EHR data present unique challenges that differ fundamentally from those in vision and natural language tasks. This survey offers a comprehensive overview of recent advancements at the intersection of deep learning, large language models (LLMs), and EHR modeling. We introduce a unified taxonomy that spans five key design dimensions: data-centric approaches, neural architecture design, learning-focused strategies, multimodal learning, and LLM-based modeling systems. Within each dimension, we review representative methods addressing data quality enhancement, structural and temporal representation, self-supervised learning, and integration with clinical knowledge. We further highlight emerging trends such as foundation models, LLM-driven clinical agents, and EHR-to-text translation for downstream reasoning. Finally, we discuss open challenges in benchmarking, explainability, clinical alignment, and generalization across diverse clinical settings. This survey aims to provide a structured roadmap for advancing AI-driven EHR modeling and clinical decision support. For a comprehensive list of EHR-related methods, kindly refer to https://survey-on-tabular-data.github.io/.