🤖 AI Summary
Online A/B testing often necessitates interim analyses due to resource constraints; however, frequent “peeking” at accumulating data inflates Type I error rates and compromises conclusion validity. To address this, we propose a Bayesian predictive probability–based framework for safe interim evaluation. Our method is the first to enable efficient, closed-form computation of Bayesian predictive probabilities—bypassing numerical integration—thereby supporting scalable deployment and real-time experiment health monitoring. It rigorously balances statistical validity with engineering practicality. Evaluated on large-scale, real-world A/B tests from Instagram, the system significantly reduces false positive rates while ensuring reliable early stopping decisions. Deployed as a production-grade infrastructure within Meta’s experimentation platform, it enhances both experimental fidelity and resource efficiency across thousands of concurrent experiments.
📝 Abstract
The widespread adoption of online randomized controlled experiments (A/B Tests) for decision-making has created ongoing capacity constraints which necessitate interim analyses. As a consequence, platform users are increasingly motivated to use ad-hoc means of optimizing limited resources via peeking. Such processes, however, are error prone and often misaligned with end-of-experiment outcomes (e.g., inflated type-I error). We introduce a system based on Bayesian Predictive Probabilities that enable us to perform interim analyses without compromising fidelity of the experiment; This idea has been widely utilized in applications outside of the technology domain to more efficiently make decisions in experiments. Motivated by at-scale deployment within an experimentation platform, we demonstrate how predictive probabilities can be estimated without numerical integration techniques and recommend systems to study its properties at scale as an ongoing health check, along with system design recommendations - all on experiment data from Instagram - to demonstrate practical benefits that it enables.