Representation Learning to Advance Multi-institutional Studies with Electronic Health Record Data

📅 2025-02-12
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Collaborative analysis of multi-institutional electronic health records (EHRs) is hindered by clinical concept encoding heterogeneity and stringent privacy constraints. Method: We propose GAME, a novel algorithm enabling cross-institutional, cross-lingual clinical concept alignment and representation learning without sharing patient-level data. GAME introduces a three-tier alignment framework integrating knowledge graph construction, large language model–based semantic mapping, and graph attention networks (GATs), jointly leveraging transfer learning and federated learning to produce generalizable embeddings. Results: Evaluated across seven healthcare institutions in bilingual (Chinese–English) settings, GAME significantly improves feature quality for disease modeling—e.g., heart failure and rheumatoid arthritis—and successfully supports multicenter studies on Alzheimer’s disease prognosis prediction and suicide risk assessment in psychiatric patients. By reconciling semantic heterogeneity while preserving data privacy, GAME establishes a new paradigm for privacy-preserving, cross-domain EHR analytics.

Technology Category

Application Category

📝 Abstract
The adoption of EHRs has expanded opportunities to leverage data-driven algorithms in clinical care and research. A major bottleneck in effectively conducting multi-institutional EHR studies is the data heterogeneity across systems with numerous codes that either do not exist or represent different clinical concepts across institutions. The need for data privacy further limits the feasibility of including multi-institutional patient-level data required to study similarities and differences across patient subgroups. To address these challenges, we developed the GAME algorithm. Tested and validated across 7 institutions and 2 languages, GAME integrates data in several levels: (1) at the institutional level with knowledge graphs to establish relationships between codes and existing knowledge sources, providing the medical context for standard codes and their relationship to each other; (2) between institutions, leveraging language models to determine the relationships between institution-specific codes with established standard codes; and (3) quantifying the strength of the relationships between codes using a graph attention network. Jointly trained embeddings are created using transfer and federated learning to preserve data privacy. In this study, we demonstrate the applicability of GAME in selecting relevant features as inputs for AI-driven algorithms in a range of conditions, e.g., heart failure, rheumatoid arthritis. We then highlight the application of GAME harmonized multi-institutional EHR data in a study of Alzheimer's disease outcomes and suicide risk among patients with mental health disorders, without sharing patient-level data outside individual institutions.
Problem

Research questions and friction points this paper is trying to address.

Addressing EHR data heterogeneity across institutions
Ensuring data privacy in multi-institutional studies
Developing GAME algorithm for harmonized EHR data analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

GAME algorithm
knowledge graphs
federated learning
🔎 Similar Papers
No similar papers found.
Doudou Zhou
Doudou Zhou
National University of Singapore
High-dimensional StatisticsEHR Data AnalysisChange-point DetectionTransfer Learning
H
Han Tong
Department of Statistics, Columbia University, NY, USA
L
Linshanshan Wang
Harvard T.H. Chan School of Public Health, MA, USA
S
Suqi Liu
Harvard Medical School, MA, USA
Xin Xiong
Xin Xiong
University of Southern California
Image ProcessingComputer VisionVideo compression
Ziming Gan
Ziming Gan
PhD in statistics, University of Chicago
EHR datasingle cell
R
R. Griffier
Univ. Bordeaux, INSERM, Bordeaux Population Health Research Center, Bordeaux, France; CHU de Bordeaux, Service d’Information MĂ©dicale, Bordeaux, France; Inria SISTM Team, Talence, France
B
B. Hejblum
Univ. Bordeaux, INSERM, Bordeaux Population Health Research Center, Bordeaux, France; CHU de Bordeaux, Service d’Information MĂ©dicale, Bordeaux, France; Inria SISTM Team, Talence, France
Y
Yun-Chung Liu
Duke University, Durham, NC, USA
C
Chuan Hong
Duke University, Durham, NC, USA
C
Clara-Lea Bonzel
Harvard T.H. Chan School of Public Health, MA, USA; Harvard Medical School, MA, USA
T
T. Cai
Harvard T.H. Chan School of Public Health, MA, USA; Harvard Medical School, MA, USA; VA Boston Healthcare System, Boston, MA, USA
K
K. Pan
Brown University, Providence, RI, USA
Y
Y. Ho
VA Boston Healthcare System, Boston, MA, USA
L
Lauren Costa
VA Boston Healthcare System, Boston, MA, USA
V
V. Panickan
Harvard Medical School, MA, USA; VA Boston Healthcare System, Boston, MA, USA
J
J. M. Gaziano
Harvard Medical School, MA, USA; VA Boston Healthcare System, Boston, MA, USA
K
Kenneth Mandl
Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA, USA
V
V. Jouhet
Univ. Bordeaux, INSERM, Bordeaux Population Health Research Center, Bordeaux, France
Rodolphe Thiébaut
Rodolphe Thiébaut
Université de bordeaux
Médecinestatistique
Zongqi Xia
Zongqi Xia
University of Pittsburgh
NeuroimmunologyNeurodegenerationPrecision MedicineTranslational ResearchBiomedical Informatics
K
Kelly Cho
Harvard Medical School, MA, USA; VA Boston Healthcare System, Boston, MA, USA
K
Katherine P. Liao
VA Boston Healthcare System, Boston, MA, USA; Brigham and Women’s Hospital, Boston, MA, USA