A Two-armed Bandit Framework for A/B Testing

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Traditional A/B testing methods suffer from low statistical power and poor detection capability under small sample sizes. To address this, we propose a novel two-armed bandit testing framework that integrates causal inference with reinforcement learning. Our approach innovatively combines doubly robust estimation with dynamic bandit policies, enabling efficient exploration via pseudo-outcome construction, and employs permutation testing for exact p-value computation. We establish theoretical consistency of the proposed test statistic. Extensive simulations and experiments on real-world ride-hailing data demonstrate that, at equal sample sizes, our method achieves significantly higher statistical power—averaging a 23.6% improvement over state-of-the-art baselines—while enhancing the accuracy of policy effect comparisons and decision reliability. The framework is particularly effective in small-sample, high-variance settings, offering a principled solution for robust online experimentation under limited data.

Technology Category

Application Category

📝 Abstract
A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and reinforcement learning methods developed in the literature are applicable to A/B testing. This paper introduces a two-armed bandit framework designed to improve the power of existing approaches. The proposed procedure consists of three main steps: (i) employing doubly robust estimation to generate pseudo-outcomes, (ii) utilizing a two-armed bandit framework to construct the test statistic, and (iii) applying a permutation-based method to compute the $p$-value. We demonstrate the efficacy of the proposed method through asymptotic theories, numerical experiments and real-world data from a ridesharing company, showing its superior performance in comparison to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Improves A/B testing power using two-armed bandit framework
Compares new policy outcomes against standard control efficiently
Enhances causal inference with doubly robust estimation and permutation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Doubly robust estimation for pseudo-outcomes
Two-armed bandit test statistic construction
Permutation-based p-value computation method
🔎 Similar Papers