Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization

📅 2021-12-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Adaptive experiments face a fundamental trade-off between statistical reliability—controlling false positives and false negatives—and cumulative reward maximization. Uniform randomization (UR) ensures valid statistical inference but yields suboptimal rewards, whereas Thompson sampling (TS) improves reward accumulation at the cost of estimation bias and reduced statistical power. This paper proposes TS-PostDiff, the first adaptive allocation method to incorporate domain-defined “negligible differences” into the decision mechanism. It dynamically blends UR and TS assignments based on the posterior probability that treatment effects fall within a pre-specified negligible-difference threshold. Theoretical analysis and two-arm simulations demonstrate that TS-PostDiff significantly reduces false positive rates and enhances statistical power under small true effects, while improving cumulative reward under large effects. It consistently outperforms pure UR, standard TS, and two leading TS variants, achieving a synergistic unification of statistical sensitivity and reward optimization.

📝 Abstract

Traditional randomized A/B experiments assign arms with uniform random (UR) probability, such as 50/50 assignment to two versions of a website to discover whether one version engages users more. To more quickly and automatically use data to benefit users, multi-armed bandit algorithms such as Thompson Sampling (TS) have been advocated. While TS is interpretable and incorporates the randomization key to statistical inference, it can cause biased estimates and increase false positives and false negatives in detecting differences in arm means. We introduce a more Statistically Sensitive algorithm, TS-PostDiff (Posterior Probability of Small Difference), that mixes TS with traditional UR by using an additional adaptive step, where the probability of using UR (vs TS) is proportional to the posterior probability that the difference in arms is small. This allows an experimenter to define what counts as a small difference, below which a traditional UR experiment can obtain informative data for statistical inference at low cost, and above which using more TS to maximize user benefits is key. We evaluate TS-PostDiff against UR, TS, and two other TS variants designed to improve statistical inference. We consider results for the common two-armed experiment across a range of settings inspired by real-world applications. Our results provide insight into when and why TS-PostDiff or alternative approaches provide better tradeoffs between benefiting users (reward) and statistical inference (false positive rate and power). TS-PostDiff's adaptivity helps efficiently reduce false positives and increase statistical power when differences are small, while increasing reward more when differences are large. The work highlights important considerations for future Statistically Sensitive algorithm development that balances reward and statistical analysis in adaptive experimentation.

Problem

Research questions and friction points this paper is trying to address.

Balancing reward maximization with statistical inference accuracy

Reducing false positives and negatives in adaptive experiments

Adaptively mixing uniform random and Thompson Sampling methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixes Thompson Sampling with uniform random assignment

Adaptive step adjusts probability based on posterior difference

Balances reward maximization and statistical inference trade-offs

🔎 Similar Papers

No similar papers found.