🤖 AI Summary
This work addresses the long-standing methodological disconnect between online A/B testing and off-policy evaluation, which has hindered synergistic improvements in experimental efficiency. We establish, for the first time, a formal mathematical equivalence between the Difference-in-Means estimator commonly used in online experiments and the Inverse Propensity Score (IPS) estimator augmented with optimal control variates in off-policy evaluation. Furthermore, we unify regression-adjusted techniques such as CUPED within a doubly robust estimation framework. By demonstrating formal equivalence in variance reduction strategies across these two paradigms, our analysis provides a coherent theoretical foundation that harmonizes prevailing methodologies in both online experimentation and off-policy evaluation, enabling more consistent and efficient causal inference practices.
📝 Abstract
Continuous and efficient experimentation is key to the practical success of user-facing applications on the web, both through online A/B-tests and off-policy evaluation. Despite their shared objective -- estimating the incremental value of a treatment -- these domains often operate in isolation, utilising distinct terminologies and statistical toolkits. This paper bridges that divide by establishing a formal equivalence between their canonical variance reduction methods. We prove that the standard online Difference-in-Means estimator is mathematically identical to an off-policy Inverse Propensity Scoring estimator equipped with an optimal (variance-minimising) additive control variate. Extending this unification, we demonstrate that widespread regression adjustment methods (such as CUPED, CUPAC, and ML-RATE) are structurally equivalent to Doubly Robust estimation. This unified view extends our understanding of commonly used approaches, and can guide practitioners and researchers working on either class of problems.