🤖 AI Summary
This paper addresses the out-of-distribution (OOD) robust generalization problem under unobserved confounding: an unobserved variable (Z) jointly influences both input (X) and label (Y), inducing predictor heterogeneity ((P(Y|X) = mathbb{E}_{Z|X}[P(Y|X,Z)])). Critically, (Z) is latent during training, its distribution shifts between training and test domains ((P^{ ext{te}}(Z)
eq P^{ ext{tr}}(Z))), and test inputs (X) are inaccessible—rendering standard covariate/label shift assumptions invalid. To overcome limitations of existing methods—such as reliance on multiple auxiliary variables or complex modeling—we propose a set of lightweight, identifiability-enabling assumptions. Based thereon, we construct a structurally simple and scalable expected conditional average predictor (mathbb{E}_{P^{ ext{te}}(Z)}[f_Z(X)]), integrating invariant feature learning with confounding-robust estimation. Theoretically grounded, our approach achieves significant accuracy improvements on standard OOD benchmarks, while enjoying linear time complexity and strong scalability.
📝 Abstract
We consider the task of out-of-distribution (OOD) generalization, where the distribution shift is due to an unobserved confounder ($Z$) affecting both the covariates ($X$) and the labels ($Y$). In this setting, traditional assumptions of covariate and label shift are unsuitable due to the confounding, which introduces heterogeneity in the predictor, i.e., $hat{Y} = f_Z(X)$. OOD generalization differs from traditional domain adaptation by not assuming access to the covariate distribution ($X^ ext{te}$) of the test samples during training. These conditions create a challenging scenario for OOD robustness: (a) $Z^ ext{tr}$ is an unobserved confounder during training, (b) $P^ ext{te}{Z}
eq P^ ext{tr}{Z}$, (c) $X^ ext{te}$ is unavailable during training, and (d) the posterior predictive distribution depends on $P^ ext{te}(Z)$, i.e., $hat{Y} = E_{P^ ext{te}(Z)}[f_Z(X)]$. In general, accurate predictions are unattainable in this scenario, and existing literature has proposed complex predictors based on identifiability assumptions that require multiple additional variables. Our work investigates a set of identifiability assumptions that tremendously simplify the predictor, whose resulting elegant simplicity outperforms existing approaches.