🤖 AI Summary
EHR foundation models struggle to generalize to out-of-distribution (OOD) medical codes, hindering cross-vocabulary transfer and multi-center integration. To address this, we propose a transferable medical concept representation framework built upon the OMOP Common Data Model (CDM). Our approach employs a dual-path modeling strategy: (1) LLM-driven semantic definition enhancement to enrich code-level semantics, and (2) a graph neural network that explicitly encodes OMOP’s ontology graph structure. Additionally, we introduce a semantic similarity-guided concept replacement trajectory augmentation method to enable zero-shot generalization to OOD codes. Experiments demonstrate substantial improvements in predictive robustness on heterogeneous external datasets. The implementation is publicly available, facilitating unified pretraining of EHR foundation models across diverse healthcare centers.
📝 Abstract
Electronic health record (EHR) foundation models have been an area ripe for exploration with their improved performance in various medical tasks. Despite the rapid advances, there exists a fundamental limitation: Processing unseen medical codes out of the vocabulary. This problem limits the generality of EHR foundation models and the integration of models trained with different vocabularies. To deal with this problem, we propose MedRep for EHR foundation models based on the observational medical outcome partnership (OMOP) common data model (CDM), providing the integrated medical concept representations and the basic data augmentation strategy for patient trajectories. For concept representation learning, we enrich the information of each concept with a minimal definition through large language model (LLM) prompts and enhance the text-based representations through graph ontology of OMOP vocabulary. Trajectory augmentation randomly replaces selected concepts with other similar concepts that have closely related representations to let the model practice with the concepts out-of-vocabulary. Finally, we demonstrate that EHR foundation models trained with MedRep better maintain the prediction performance in external datasets. Our code implementation is publicly available at https://github.com/kicarussays/MedRep.