MedRep: Medical Concept Representation for General Electronic Health Record Foundation Models

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

EHR foundation models struggle to generalize to out-of-distribution (OOD) medical codes, hindering cross-vocabulary transfer and multi-center integration. To address this, we propose a transferable medical concept representation framework built upon the OMOP Common Data Model (CDM). Our approach employs a dual-path modeling strategy: (1) LLM-driven semantic definition enhancement to enrich code-level semantics, and (2) a graph neural network that explicitly encodes OMOP’s ontology graph structure. Additionally, we introduce a semantic similarity-guided concept replacement trajectory augmentation method to enable zero-shot generalization to OOD codes. Experiments demonstrate substantial improvements in predictive robustness on heterogeneous external datasets. The implementation is publicly available, facilitating unified pretraining of EHR foundation models across diverse healthcare centers.

Technology Category

Application Category

📝 Abstract

Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of the vocabulary. This problem limits the generality of EHR foundation models and the integration of models trained with different vocabularies. To deal with this problem, we propose MedRep for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM), providing the integrated medical concept representations and the basic data augmentation strategy for patient trajectories. For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and enhance the text-based representations through graph ontology of OMOP vocabulary. Trajectory augmentation randomly replaces selected concepts with other similar concepts that have closely related representations to let the model practice with the concepts out-of-vocabulary. Finally, we demonstrate that EHR foundation models trained with MedRep better maintain the prediction performance in external datasets. Our code implementation is publicly available at https://github.com/kicarussays/MedRep.

Problem

Research questions and friction points this paper is trying to address.

Handling unseen medical codes in EHR models

Integrating models with different medical vocabularies

Improving EHR model performance on external datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses OMOP CDM for EHR concept representation

Enhances concepts with LLM prompts and ontology

Augments trajectories with similar concept replacements

🔎 Similar Papers

EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation