Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Off-policy evaluation (OPE) faces three fundamental challenges: high estimator variance, difficulty in finite-sample uncertainty quantification, and causal non-identifiability under unobserved confounding. To address these, we propose: (1) a Marginal Ratio (MR) estimator that substantially reduces importance sampling variance by leveraging marginal state-action distribution modeling; (2) the Conformal Off-Policy Prediction (COPP) framework—the first to deliver finite-sample, distribution-free prediction intervals for OPE with guaranteed coverage validity; and (3) sequence-wise causal bounds coupled with a digital twin falsification mechanism to ensure robustness under strong unobserved confounding. Our approach integrates marginal distribution estimation, conformal inference, and sensitivity analysis, balancing theoretical rigor with practical implementability. Extensive experiments across static and sequential decision-making environments demonstrate consistent improvements in OPE robustness, calibration, and reliability—establishing a new paradigm for trustworthy AI-driven decision making.

Technology Category

Application Category

📝 Abstract
Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.
Problem

Research questions and friction points this paper is trying to address.

Reduces variance in off-policy evaluation estimators
Provides finite-sample predictive intervals in OPE
Addresses causal unidentifiability in sequential decision settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Marginal Ratio estimator reduces variance
Conformal Off-Policy Prediction ensures robust decisions
Novel bounds address causal unidentifiability
🔎 Similar Papers
No similar papers found.