A Statistically Reliable Optimization Framework for Bandit Experiments in Scientific Discovery

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the statistical invalidity of classical hypothesis tests—such as the t-test—when applied to data collected via adaptive sampling in multi-armed bandit (MAB) settings, which inflates both Type I and Type II error rates and lacks a unified framework for balancing cumulative reward and statistical power. The authors propose a statistically sound optimization framework that jointly modifies the null hypothesis and adjusts the critical region, thereby restoring the validity of classical tests under adaptively collected data. Furthermore, they introduce an objective function based on the cost of experimental extension to uniformly evaluate and select optimal MAB algorithms. By uniquely integrating hypothesis modification with critical region correction, the method enhances statistical power while preserving validity; simulations demonstrate its superior performance over existing approaches, achieving substantially higher result quality with only a modest increase in experimental duration.

Technology Category

Application Category

📝 Abstract

Scientific experimentation is largely driven by statistical hypothesis testing to determine significant differences in interventions. Traditionally, experimenters allocate samples uniformly between each intervention. However, such an approach may lead to suboptimal outcomes - multi-armed bandits (MABs) addresses this problem by allocating samples adaptively to maximize outcomes. Yet, two challenges have hindered the use of MABs in scientific domains. First, common hypothesis tests (e.g., $t$-tests) become invalid under adaptive sampling without correction, leading to inflated type~I and type~II errors. This is an understudied problem, and prior solutions suffer from issues such as low statistical power which prevent adoption in many practical settings. Second, practitioners must explicitly balance cumulative reward with statistical efficiency, yet no general methodology exists to quantify this trade-off across algorithms. In this paper, we study assumption modification and critical region correction approaches for hypothesis testing that enable common tests to be applied to adaptively collected data. We provide heuristic justification for its power efficiency and show in simulation that it achieves higher power than existing approaches. Further, we derive a theoretically and practically motivated objective function for adaptive experiment evaluation, which we integrate into a unified experimental framework. Our framework asks experimenters to specify an experiment extension cost for their problem, and based on that enables our proposed optimization procedure to select the bandit algorithm that best balances reward and power in their setting. We show that our approach enables practitioners to improve outcomes with only slightly more steps than uniform randomization, while retaining statistical validity.

Problem

Research questions and friction points this paper is trying to address.

multi-armed bandits

statistical hypothesis testing

adaptive sampling

type I error

statistical power

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-armed bandits

adaptive sampling

statistical validity