🤖 AI Summary
Short-term A/B tests in online experimentation are vulnerable to time-varying nonstationarity, limiting their ability to accurately estimate long-term system effects; conversely, long-term experiments suffer from slow iteration cycles and poor scalability in large action spaces. To address this, we propose a sequential Bayesian optimization framework that integrates fast and slow online experiments with offline proxy evaluation. Specifically, short-cycle, biased online experiments—combined with off-policy evaluation (OPE)—provide rapid, low signal-to-noise ratio feedback, while long-cycle, unbiased experiments calibrate long-term impact. A unified Bayesian model jointly assimilates asynchronous, heterogeneous observations from these sources, enabling adaptive exploration–exploitation trade-offs. Our approach ensures accurate long-term effect estimation while substantially reducing optimization latency, thereby enhancing both decision-making efficiency and robustness in large-scale internet systems.
📝 Abstract
Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies--which typically involve several iterations--can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.