MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
EHR foundation models struggle to generalize to out-of-distribution (OOD) medical codes, hindering cross-vocabulary transfer and multi-center integration. To address this, we propose a transferable medical concept representation framework built upon the OMOP Common Data Model (CDM). Our approach employs a dual-path modeling strategy: (1) LLM-driven semantic definition enhancement to enrich code-level semantics, and (2) a graph neural network that explicitly encodes OMOP’s ontology graph structure. Additionally, we introduce a semantic similarity-guided concept replacement trajectory augmentation method to enable zero-shot generalization to OOD codes. Experiments demonstrate substantial improvements in predictive robustness on heterogeneous external datasets. The implementation is publicly available, facilitating unified pretraining of EHR foundation models across diverse healthcare centers.

Technology Category

Application Category

📝 Abstract
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of the vocabulary. This problem limits the generality of EHR foundation models and the integration of models trained with different vocabularies. To deal with this problem, we propose MedRep for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM), providing the integrated medical concept representations and the basic data augmentation strategy for patient trajectories. For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and enhance the text-based representations through graph ontology of OMOP vocabulary. Trajectory augmentation randomly replaces selected concepts with other similar concepts that have closely related representations to let the model practice with the concepts out-of-vocabulary. Finally, we demonstrate that EHR foundation models trained with MedRep better maintain the prediction performance in external datasets. Our code implementation is publicly available at https://github.com/kicarussays/MedRep.
Problem

Research questions and friction points this paper is trying to address.

Handling unseen medical codes in EHR models
Integrating models with different medical vocabularies
Improving EHR model performance on external datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses OMOP CDM for EHR concept representation
Enhances concepts with LLM prompts and ontology
Augments trajectories with similar concept replacements
🔎 Similar Papers
No similar papers found.
Junmo Kim
Junmo Kim
School of Electrical Engineering, KAIST
Statistical Signal ProcessingImage ProcessingComputer VisionMachine LearningInformation Theory
Namkyeong Lee
Namkyeong Lee
KAIST
AI for Science
J
Jiwon Kim
Interdisciplinary Program of Medical Informatics, Seoul National University
K
Kwangsoo Kim
Dept. of Transdisciplinary Medicine, ICMIT, Seoul National University Hospital; Center for Data Science, Healthcare AI Research Institute, Seoul National University Hospital; Dept. of Medicine, College of Medicine, Seoul National University