đ€ AI Summary
Electronic health record (EHR) data exhibit high heterogeneity, and existing specialized models rely heavily on private, institution-specific medical datasets, limiting generalizability and scalability.
Method: We propose a novel paradigm leveraging general-purpose large language models (LLMs) as lightweight encoders. Patient records are serialized into structured Markdown text, and clinical codes are mapped to semantically rich natural language descriptions. This enables zero-shot or few-shot clinical prediction by harnessing LLMsâ broad generalization capabilities acquired from public corpora.
Contribution/Results: We conduct the first systematic evaluation on the EHRSHOT benchmark across 15 tasks, demonstrating that general LLM embedding modelsâe.g., GTE-Qwen2-7B and LLMS2Vec-Llama3.1-8Bâmatch or surpass specialized foundation models such as CLIMBR-T-Base, particularly under few-shot settings with superior robustness. We further identify positive correlations between LLM parameter count, context length, and predictive performance. Our findings establish that general LLMs can effectively replace domain-specific EHR encoders, significantly improving cross-institutional generalizability and deployment scalability.
đ Abstract
Electronic Health Records (EHRs) offer rich potential for clinical prediction, yet their inherent complexity and heterogeneity pose significant challenges for traditional machine learning approaches. Domain-specific EHR foundation models trained on large collections of unlabeled EHR data have demonstrated promising improvements in predictive accuracy and generalization; however, their training is constrained by limited access to diverse, high-quality datasets and inconsistencies in coding standards and healthcare practices. In this study, we explore the possibility of using general-purpose Large Language Models (LLMs) based embedding methods as EHR encoders. By serializing patient records into structured Markdown text, transforming codes into human-readable descriptors, we leverage the extensive generalization capabilities of LLMs pretrained on vast public corpora, thereby bypassing the need for proprietary medical datasets. We systematically evaluate two state-of-the-art LLM-embedding models, GTE-Qwen2-7B-Instruct and LLM2Vec-Llama3.1-8B-Instruct, across 15 diverse clinical prediction tasks from the EHRSHOT benchmark, comparing their performance to an EHRspecific foundation model, CLIMBR-T-Base, and traditional machine learning baselines. Our results demonstrate that LLM-based embeddings frequently match or exceed the performance of specialized models, even in few-shot settings, and that their effectiveness scales with the size of the underlying LLM and the available context window. Overall, our findings demonstrate that repurposing LLMs for EHR encoding offers a scalable and effective approach for clinical prediction, capable of overcoming the limitations of traditional EHR modeling and facilitating more interoperable and generalizable healthcare applications.