Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical prediction models often suffer performance degradation when deployed across healthcare systems due to structural missingness of covariates in the target domain and the absence of labeled data. To address this challenge, this work proposes DRUM, a robust unsupervised transfer learning framework that avoids imputing missing covariates or relying on unverifiable distributional assumptions. Instead, DRUM optimizes worst-case predictive performance over the conditional distribution of missing covariates. The method integrates distributionally robust optimization with neural network–based generative modeling, conditional distribution estimation, and a bias-correction mechanism, further incorporating a tunable robustness parameter to mitigate sensitivity to perturbation estimation errors. In both simulated experiments and a real-world cross-national cardiac arrest prediction task, DRUM substantially improves predictive accuracy, calibration, and clinical classification performance under both average and worst-case scenarios.
📝 Abstract
Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for out-of-hospital cardiac arrest (OHCA) rely on detailed prehospital measurements routinely collected in high-resource settings but unavailable in many international registries. Existing methods either discard missing covariates, sacrificing predictive information, or rely on untestable assumptions about their target distribution. We propose DRUM (\underline{D}istributionally \underline{R}obust \underline{U}nsupervised transfer learning with structurally \underline{M}issing covariates), a framework that transfers prediction models to target populations where certain covariates are structurally absent and outcome labels are unavailable. DRUM partitions covariates into shared components ($X$), observed across all settings, and missing components ($A$), observed only in the source. Rather than imputing missing covariates, DRUM optimizes worst-case predictive performance over the unknown target distribution of $A \mid X$ using a neural network generator, with a robustness parameter controlling allowable deviation from the source conditional. We further develop a bias correction procedure that reduces sensitivity to nuisance estimation error. Simulations show substantial improvements in both mean and worst-case prediction error under distribution shift. Applied to cross-national OHCA prediction, transferring models from a US registry to multiple Asian registries where prehospital variables are unrecorded, DRUM yields better-calibrated predictions and improved clinical classification performance across sites.
Problem

Research questions and friction points this paper is trying to address.

transfer learning
structurally missing covariates
distributional robustness
cross-national prediction
clinical risk prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

distributionally robust optimization
transfer learning
structurally missing covariates
unsupervised domain adaptation
clinical prediction models
🔎 Similar Papers
S
Siqi Li
Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore
C
Chuan Hong
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
Z
Ziye Tian
Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
B
Benjamin Sieu-Hon Leong
Emergency Medicine Department, National University Hospital, Singapore
K
Koshi Nakagawa
Department of Sport and Medical Science, Faculty of Physical Education, Kokushikan University, Tokyo, Japan
H
Hideharu Tanaka
Graduate School of Emergency Medical System, Kokushikan University, Tokyo, Japan
S
Sang Do Shin
Department of Emergency Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
K
Khuong Quoc Dai
Center for Emergency Medicine, Bach Mai Hospital, Hanoi, Vietnam
D
Do Ngoc Son
Center for Critical Care Medicine, Bach Mai Hospital, Hanoi, Vietnam
M
Marcus Eng Hock Ong
Health Services Research Centre, Singapore Health Services, Singapore
N
Nan Liu
Centre for Biomedical Data Science, Duke-NUS Medical School, Singapore
Molei Liu
Molei Liu
Peking University
High-dimensional statisticsStatistical machine learningSemiparametric theoryModel-X