Subset Selection for Stratified Sampling in Online Controlled Experiments

๐Ÿ“… 2025-09-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the critical problem of variance reduction via stratified sampling in online A/B testing. We propose an efficient stratification variable subset selection algorithm that dynamically evaluates the marginal contribution of each variable to estimation variance through layer-wise simulation of the stratification process, enabling precise identification of high-information stratification variablesโ€”even under multivariate correlation. Unlike conventional approaches relying on pairwise correlation or heuristic filtering, our method directly optimizes for variance minimization, ensuring both theoretical interpretability and computational efficiency. Experiments on synthetic and real-world business datasets demonstrate that our approach reduces estimation variance by 18%โ€“32% on average compared to classical methods such as covariate adjustment and CUPED. This translates into significantly improved statistical power and experimental sensitivity, facilitating faster and more reliable causal inference in production A/B testing environments.

Technology Category

Application Category

๐Ÿ“ Abstract
Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for variance reduction to improve the sensitivity (or statistical power) of controlled experiments; this technique first divides the population into strata (homogeneous subgroups) based on stratification variables and then draws samples from each stratum to avoid sampling bias. To enhance the estimation accuracy of stratified sampling, we focus on the problem of selecting a subset of stratification variables that are effective in variance reduction. We design an efficient algorithm that selects stratification variables one by one by simulating a series of stratified sampling processes. We also estimate the computational complexity of our subset selection algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our method can outperform other variance reduction techniques especially when multiple variables have a certain correlation with the outcome variable. Our subset selection method for stratified sampling can improve the sensitivity of online controlled experiments, thus enabling more reliable marketing decisions.
Problem

Research questions and friction points this paper is trying to address.

Selecting effective stratification variables for variance reduction
Designing efficient algorithm for subset selection in stratified sampling
Improving sensitivity of online controlled experiments through optimized sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm selects stratification variables sequentially
Simulates stratified sampling for variance reduction
Outperforms others with correlated outcome variables
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Haru Momozu
University of Tsukuba, Tsukuba, Ibaraki 305-8573 Japan
Y
Yuki Uehara
Preferred Networks, Inc., Chiyoda-ku, Tokyo 100-0004 Japan
Naoki Nishimura
Naoki Nishimura
University of Tsukuba, Tsukuba, Ibaraki 305-8573 Japan
K
Koya Ohashi
Mercari, Inc., Minato-ku, Tokyo 106-6118 Japan
D
Deddy Jobson
Mercari, Inc., Minato-ku, Tokyo 106-6118 Japan
Yilin Li
Yilin Li
University of Washington
conjugated polymersluminescent solar concentrators
P
Phuong Dinh
Mercari, Inc., Minato-ku, Tokyo 106-6118 Japan
Noriyoshi Sukegawa
Noriyoshi Sukegawa
Department of Advanced Sciences, Faculty of Science and Engineering, Hosei University
Integer ProgrammingCombinatorial OptimizationPolyhedral CombinatoricsOperations Research
Y
Yuichi Takano
University of Tsukuba, Tsukuba, Ibaraki 305-8573 Japan