🤖 AI Summary
In large-scale A/B testing, detecting small yet economically meaningful average treatment effects (ATEs) is hindered by low statistical power. To address this, we propose a maximum-probability-driven two-armed bandit testing framework that integrates counterfactual modeling, weighted volatility statistics, and permutation testing. Crucially, we introduce the “strategic central limit theorem,” which concentrates the test statistic’s distribution under the null hypothesis while dispersing it under the alternative—thereby substantially enhancing detection sensitivity without inflating Type I error. Empirical evaluations demonstrate that our method achieves high statistical power while significantly reducing required sample size and experimental duration, thus lowering operational costs. The framework is both theoretically rigorous—grounded in asymptotic theory and causal inference principles—and practically deployable in industrial settings.
📝 Abstract
Detecting a minor average treatment effect is a major challenge in large-scale applications, where even minimal improvements can have a significant economic impact. Traditional methods, reliant on normal distribution-based or expanded statistics, often fail to identify such minor effects because of their inability to handle small discrepancies with sufficient sensitivity. This work leverages a counterfactual outcome framework and proposes a maximum probability-driven two-armed bandit (TAB) process by weighting the mean volatility statistic, which controls Type I error. The implementation of permutation methods further enhances the robustness and efficacy. The established strategic central limit theorem (SCLT) demonstrates that our approach yields a more concentrated distribution under the null hypothesis and a less concentrated one under the alternative hypothesis, greatly improving statistical power. The experimental results indicate a significant improvement in the A/B testing, highlighting the potential to reduce experimental costs while maintaining high statistical power.