Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the critical problem of estimating counterfactual outcome distributions under alternative policies—a task central to recommendation systems, online advertising, and medical decision-making. We propose the Counterfactual Policy Mean Embedding (CPME) framework, which nonparametrically represents the full counterfactual distribution in a reproducing kernel Hilbert space (RKHS), enabling offline policy evaluation, distributional hypothesis testing, and counterfactual sampling. To our knowledge, this is the first work to extend double-robust estimation to *distribution-level* counterfactual inference. We construct a CPME estimator with consistency and fast convergence rates, and derive an asymptotically normal kernel-based test statistic. Experiments demonstrate that CPME significantly improves accuracy and robustness in offline policy evaluation, enables efficient confidence interval construction, and uniformly outperforms existing distribution-level evaluation methods in comprehensive simulations.

Technology Category

Application Category

📝 Abstract

Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved uniform convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.

Problem

Research questions and friction points this paper is trying to address.

Estimating counterfactual policy outcome distributions for decision-making

Developing doubly robust estimators for improved convergence rates

Enabling flexible nonparametric distributional off-policy evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Represents counterfactual outcomes in RKHS

Uses doubly robust estimator for convergence

Develops kernel test for asymptotic normality

🔎 Similar Papers

No similar papers found.

Authors to Follow