Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional time series A/B testing struggles to achieve optimal treatment allocation due to its neglect of full historical dependencies and reliance on strong modeling assumptions. This work proposes an end-to-end framework that integrates Transformers with reinforcement learning: the Transformer captures complex temporal dependencies across the entire history, while reinforcement learning directly optimizes the mean squared error of treatment effect estimation without imposing restrictive functional form assumptions. Theoretical analysis establishes, for the first time, that omitting historical information leads to suboptimal experimental designs. Extensive experiments demonstrate substantial improvements over existing methods across synthetic data, a public scheduling simulator, and real-world ride-hailing datasets, significantly enhancing both estimation accuracy and experimental efficiency.

Technology Category

Application Category

πŸ“ Abstract
A/B testing has become a gold standard for modern technological companies to conduct policy evaluation. Yet, its application to time series experiments, where policies are sequentially assigned over time, remains challenging. Existing designs suffer from two limitations: (i) they do not fully leverage the entire history for treatment allocation; (ii) they rely on strong assumptions to approximate the objective function (e.g., the mean squared error of the estimated treatment effect) for optimizing the design. We first establish an impossibility theorem showing that failure to condition on the full history leads to suboptimal designs, due to the dynamic dependencies in time series experiments. To address both limitations simultaneously, we next propose a transformer reinforcement learning (RL) approach which leverages transformers to condition allocation on the entire history and employs RL to directly optimize the MSE without relying on restrictive assumptions. Empirical evaluations on synthetic data, a publicly available dispatch simulator, and a real-world ridesharing dataset demonstrate that our proposal consistently outperforms existing designs.
Problem

Research questions and friction points this paper is trying to address.

A/B testing
time series experiments
treatment allocation
dynamic dependencies
experimental design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer
Reinforcement Learning
Time Series Experiments
A/B Testing
Treatment Allocation
πŸ”Ž Similar Papers
No similar papers found.
X
Xiangkun Wu
School of Mathematical Sciences, Zhejiang University, Hangzhou, China
Q
Qianglin Wen
Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming, China
Yingying Zhang
Yingying Zhang
East China Normal University
Subgroup AnalysisQuantile RegressionTensor LearningReinforcement Learning
Hongtu Zhu
Hongtu Zhu
Kenan Distinguished Professor, The University of North Carolina at Chapel Hill
Medical Imaging Analysis, Statistical LearningMachine LearningAI for Two-sided Markets
T
Ting Li
School of Statistics and Data Science, Shanghai University of Finance and Economics, Shanghai, China
Chengchun Shi
Chengchun Shi
London School of Economics and Political Science
Large Language ModelsReinforcement LearningStatistics