π€ AI Summary
Traditional time series A/B testing struggles to achieve optimal treatment allocation due to its neglect of full historical dependencies and reliance on strong modeling assumptions. This work proposes an end-to-end framework that integrates Transformers with reinforcement learning: the Transformer captures complex temporal dependencies across the entire history, while reinforcement learning directly optimizes the mean squared error of treatment effect estimation without imposing restrictive functional form assumptions. Theoretical analysis establishes, for the first time, that omitting historical information leads to suboptimal experimental designs. Extensive experiments demonstrate substantial improvements over existing methods across synthetic data, a public scheduling simulator, and real-world ride-hailing datasets, significantly enhancing both estimation accuracy and experimental efficiency.
π Abstract
A/B testing has become a gold standard for modern technological companies to conduct policy evaluation. Yet, its application to time series experiments, where policies are sequentially assigned over time, remains challenging. Existing designs suffer from two limitations: (i) they do not fully leverage the entire history for treatment allocation; (ii) they rely on strong assumptions to approximate the objective function (e.g., the mean squared error of the estimated treatment effect) for optimizing the design. We first establish an impossibility theorem showing that failure to condition on the full history leads to suboptimal designs, due to the dynamic dependencies in time series experiments. To address both limitations simultaneously, we next propose a transformer reinforcement learning (RL) approach which leverages transformers to condition allocation on the entire history and employs RL to directly optimize the MSE without relying on restrictive assumptions. Empirical evaluations on synthetic data, a publicly available dispatch simulator, and a real-world ridesharing dataset demonstrate that our proposal consistently outperforms existing designs.