Learning Clinical Representations Under Systematic Distribution Shift

📅 2026-03-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of distribution shift in clinical machine learning models caused by inter-institutional differences in measurement protocols and workflows, which often introduce site-specific artifacts into physiological signals. The authors propose a practice-invariant representation learning framework that explicitly disentangles latent physiological factors from environment-dependent processes in multimodal clinical data—a first in this domain. By jointly optimizing predictive performance, adversarial environment regularization, and a cross-institution invariant risk penalty, the method enforces structural invariance constraints. Evaluated on multiple longitudinal electronic health record prediction tasks, the approach substantially improves out-of-distribution generalization across institutions, yielding AUROC gains of 2–3 percentage points while preserving in-domain performance and enhancing model calibration.

Technology Category

Application Category

📝 Abstract

Clinical machine learning models are increasingly trained using large scale, multimodal foundation paradigms, yet deployment environments often differ systematically from the data generating settings used during training. Such shifts arise from heterogeneous measurement policies, documentation practices, and institutional workflows, leading to representation entanglement between physiologic signal and practice specific artifacts. In this work, we propose a practice invariant representation learning framework for multimodal clinical prediction. We model clinical observations as arising from latent physiologic factors and environment dependent processes, and introduce an objective that jointly optimizes predictive performance while suppressing environment predictive information in the learned embedding. Concretely, we combine supervised risk minimization with adversarial environment regularization and invariant risk penalties across hospitals. Across multiple longitudinal EHR prediction tasks and cross institution evaluations, our method improves out of distribution AUROC by up to 2 to 3 points relative to masked pretraining and standard supervised baselines, while maintaining in distribution performance and improving calibration. These results demonstrate that explicitly accounting for systematic distribution shift during representation learning yields more robust and transferable clinical models, highlighting the importance of structural invariance alongside architectural scale in healthcare AI.

Problem

Research questions and friction points this paper is trying to address.

distribution shift

clinical representation learning

environmental heterogeneity

out-of-distribution generalization

healthcare AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

invariant representation learning

distribution shift

multimodal clinical prediction

adversarial regularization

electronic health records

🔎 Similar Papers

No similar papers found.

Authors to Follow