Point-Identification of a Robust Predictor Under Latent Shift with Imperfect Proxies

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of identifying robust predictors when distribution shifts arise from latent confounders and proxy variables violate the completeness assumption. To characterize the set of indistinguishable confounding structures under imperfect proxies, the authors introduce Latent Equivalence Classes (LECs). They establish a weaker cross-domain mixture weight rank condition that enables point identification of the robust predictor. Furthermore, they propose a novel Proximal Quasi-Bayesian Active Learning framework (PQAL) to efficiently select the minimal number of source domains necessary to satisfy the identification condition. Experiments on synthetic and semi-synthetic dSprites datasets demonstrate that the proposed method accurately recovers robust predictors and significantly outperforms baseline approaches across various distribution shifts.

Technology Category

Application Category

📝 Abstract

Addressing the domain adaptation problem becomes more challenging when distribution shifts across domains stem from latent confounders that affect both covariates and outcomes. Existing proxy-based approaches that address latent shift rely on a strong completeness assumption to uniquely determine (point-identify) a robust predictor. Completeness requires that proxies have sufficient information about variations in latent confounders. For imperfect proxies the mapping from confounders to the space of proxy distributions is non-injective, and multiple latent confounder values can generate the same proxy distribution. This breaks the completeness assumption and observed data are consistent with multiple potential predictors (set-identified). To address this, we introduce latent equivalent classes (LECs). LECs are defined as groups of latent confounders that induce the same conditional proxy distribution. We show that point-identification for the robust predictor remains achievable as long as multiple domains differ sufficiently in how they mix proxy-induced LECs to form the robust predictor. This domain diversity condition is formalized as a cross-domain rank condition on the mixture weights, which is substantially weaker assumption than completeness. We introduce the Proximal Quasi-Bayesian Active learning (PQAL) framework, which actively queries a minimal set of diverse domains that satisfy this rank condition. PQAL can efficiently recover the point-identified predictor, demonstrates robustness to varying degrees of shift and outperforms previous methods on synthetic data and semi-synthetic dSprites dataset.

Problem

Research questions and friction points this paper is trying to address.

latent shift

imperfect proxies

point-identification

domain adaptation

completeness assumption

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent shift

proxy variable

point identification