Efficient and Sharp Off-Policy Learning under Unobserved Confounding

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address policy evaluation bias induced by unobserved confounding in off-policy learning, this paper proposes the first causal sensitivity analysis–driven robust offline policy learning framework. Methodologically, it abandons unstable inverse propensity weighting and instead formulates a minimax optimization to directly estimate tight, robust bounds on the value function. Theoretically, the resulting policy is proven to be optimal under confounding robustness and achieves nonparametric statistical efficiency. Key contributions include: (i) the first unification of causal sensitivity analysis with offline policy optimization—circumventing the strong unconfoundedness assumption; and (ii) a novel paradigm for robust bound inference. Experiments on synthetic benchmarks and real-world clinical datasets demonstrate that our method significantly outperforms plug-in estimators and state-of-the-art baselines, substantially enhancing policy safety and decision reliability.

Technology Category

Application Category

📝 Abstract
We develop a novel method for personalized off-policy learning in scenarios with unobserved confounding. Thereby, we address a key limitation of standard policy learning: standard policy learning assumes unconfoundedness, meaning that no unobserved factors influence both treatment assignment and outcomes. However, this assumption is often violated, because of which standard policy learning produces biased estimates and thus leads to policies that can be harmful. To address this limitation, we employ causal sensitivity analysis and derive a statistically efficient estimator for a sharp bound on the value function under unobserved confounding. Our estimator has three advantages: (1) Unlike existing works, our estimator avoids unstable minimax optimization based on inverse propensity weighted outcomes. (2) Our estimator is statistically efficient. (3) We prove that our estimator leads to the optimal confounding-robust policy. Finally, we extend our theory to the related task of policy improvement under unobserved confounding, i.e., when a baseline policy such as the standard of care is available. We show in experiments with synthetic and real-world data that our method outperforms simple plug-in approaches and existing baselines. Our method is highly relevant for decision-making where unobserved confounding can be problematic, such as in healthcare and public policy.
Problem

Research questions and friction points this paper is trying to address.

personalized off-policy learning
unobserved confounding
statistically efficient estimator
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal sensitivity analysis method
Statistically efficient estimator
Optimal confounding-robust policy
🔎 Similar Papers
No similar papers found.