Domain-invariant Clinical Representation Learning by Bridging Data Distribution Shift across EMR Datasets

📅 2023-10-11

📈 Citations: 2

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Early diagnosis and treatment of emerging diseases face critical challenges including scarcity of electronic medical record (EMR) data, severe cross-source distribution shift, limited labeled samples, and stringent privacy constraints. Method: We propose an unsupervised domain-invariant representation learning framework for clinical EMRs. To our knowledge, this is the first approach in clinical representation learning to explicitly regularize inter-dataset feature distribution shifts—achieved by integrating Wasserstein distance minimization with a task-guided feature disentanglement architecture, enhanced via adversarial training for robust source-to-target feature transfer. The model operates without target-domain labels, jointly optimizing for both distribution invariance and task relevance. Contribution/Results: Extensive experiments demonstrate substantial improvements over state-of-the-art baselines in multi-task prognosis prediction. Training convergence accelerates by 40% under low-shot settings, and prediction accuracy for low-resource infectious diseases—such as emerging pathogens—increases markedly.

📝 Abstract

Emerging diseases present challenges in symptom recognition and timely clinical intervention due to limited available information. An effective prognostic model could assist physicians in making accurate diagnoses and designing personalized treatment plans to prevent adverse outcomes. However, in the early stages of disease emergence, several factors hamper model development: limited data collection, insufficient clinical experience, and privacy and ethical concerns restrict data availability and complicate accurate label assignment. Furthermore, Electronic Medical Record (EMR) data from different diseases or sources often exhibit significant cross-dataset feature misalignment, severely impacting the effectiveness of deep learning models. We present a domain-invariant representation learning method that constructs a transition model between source and target datasets. By constraining the distribution shift of features generated across different domains, we capture domain-invariant features specifically relevant to downstream tasks, developing a unified domain-invariant encoder that achieves better feature representation across various task domains. Experimental results across multiple target tasks demonstrate that our proposed model surpasses competing baseline methods and achieves faster training convergence, particularly when working with limited data. Extensive experiments validate our method's effectiveness in providing more accurate predictions for emerging pandemics and other diseases. Code is publicly available at https://github.com/wang1yuhang/domain_invariant_network.

Problem

Research questions and friction points this paper is trying to address.

Addresses feature misalignment in EMR datasets

Develops domain-invariant clinical representation learning

Improves predictions for emerging diseases with limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-invariant representation learning method

Transition model between datasets

Unified domain-invariant encoder

🔎 Similar Papers

No similar papers found.