Strategic A/B testing via Maximum Probability-driven Two-armed Bandit

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

In large-scale A/B testing, detecting small yet economically meaningful average treatment effects (ATEs) is hindered by low statistical power. To address this, we propose a maximum-probability-driven two-armed bandit testing framework that integrates counterfactual modeling, weighted volatility statistics, and permutation testing. Crucially, we introduce the “strategic central limit theorem,” which concentrates the test statistic’s distribution under the null hypothesis while dispersing it under the alternative—thereby substantially enhancing detection sensitivity without inflating Type I error. Empirical evaluations demonstrate that our method achieves high statistical power while significantly reducing required sample size and experimental duration, thus lowering operational costs. The framework is both theoretically rigorous—grounded in asymptotic theory and causal inference principles—and practically deployable in industrial settings.

Technology Category

Application Category

📝 Abstract

Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.

Problem

Research questions and friction points this paper is trying to address.

Detecting minor average treatment effects in large-scale applications

Improving sensitivity to small discrepancies in A/B testing

Reducing experimental costs while maintaining high statistical power

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum probability-driven two-armed bandit process

Weighted mean volatility statistic controls Type I error

Permutation methods enhance robustness and efficacy

🔎 Similar Papers

Ranking by Lifts: A Cost-Benefit Approach to Large-Scale A/B Tests