🤖 AI Summary
This study addresses three critical challenges in electronic health record (EHR) analytics: (1) the limited utility of unstructured clinical text for high-quality clinical decision support; (2) cross-institutional semantic heterogeneity among EHR data; and (3) insufficient generalizability and fairness of medical AI models. To tackle these, we propose the first systematic, large language model (LLM)-driven framework that integrates heterogeneous EHR modalities—including free-text notes, structured laboratory values, and clinical codes. Our method introduces an ontology-guided, cross-institutional semantic alignment mechanism, coupled with interpretable fine-tuning and bias-correction strategies, to enable text-augmented multimodal representation learning. Evaluated on multicenter clinical prediction tasks, our framework achieves a mean AUC improvement of 5.2%, demonstrating enhanced model robustness. Furthermore, it exhibits superior predictive fairness across diverse demographic subgroups, validating its equitable performance in real-world heterogeneous healthcare settings.
📝 Abstract
The advent of large language models (LLMs) has opened new avenues for analyzing complex, unstructured data, particularly within the medical domain. Electronic Health Records (EHRs) contain a wealth of information in various formats, including free text clinical notes, structured lab results, and diagnostic codes. This paper explores the application of advanced language models to leverage these diverse data sources for improved clinical decision support. We will discuss how text-based features, often overlooked in traditional high dimensional EHR analysis, can provide semantically rich representations and aid in harmonizing data across different institutions. Furthermore, we delve into the challenges and opportunities of incorporating medical codes and ensuring the generalizability and fairness of AI models in healthcare.