Unifying On- and Off-Policy Variance Reduction Methods

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the long-standing methodological disconnect between online A/B testing and off-policy evaluation, which has hindered synergistic improvements in experimental efficiency. We establish, for the first time, a formal mathematical equivalence between the Difference-in-Means estimator commonly used in online experiments and the Inverse Propensity Score (IPS) estimator augmented with optimal control variates in off-policy evaluation. Furthermore, we unify regression-adjusted techniques such as CUPED within a doubly robust estimation framework. By demonstrating formal equivalence in variance reduction strategies across these two paradigms, our analysis provides a coherent theoretical foundation that harmonizes prevailing methodologies in both online experimentation and off-policy evaluation, enabling more consistent and efficient causal inference practices.

Technology Category

Application Category

📝 Abstract
Continuous and efficient experimentation is key to the practical success of user-facing applications on the web, both through online A/B-tests and off-policy evaluation. Despite their shared objective -- estimating the incremental value of a treatment -- these domains often operate in isolation, utilising distinct terminologies and statistical toolkits. This paper bridges that divide by establishing a formal equivalence between their canonical variance reduction methods. We prove that the standard online Difference-in-Means estimator is mathematically identical to an off-policy Inverse Propensity Scoring estimator equipped with an optimal (variance-minimising) additive control variate. Extending this unification, we demonstrate that widespread regression adjustment methods (such as CUPED, CUPAC, and ML-RATE) are structurally equivalent to Doubly Robust estimation. This unified view extends our understanding of commonly used approaches, and can guide practitioners and researchers working on either class of problems.
Problem

Research questions and friction points this paper is trying to address.

on-policy
off-policy
variance reduction
causal inference
experimentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

variance reduction
off-policy evaluation
A/B testing
doubly robust estimation
control variates
🔎 Similar Papers
No similar papers found.