Estimation of Treatment Effects Under Nonstationarity via Truncated Difference-in-Q's

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In nonstationary dynamic environments—such as recommendation systems and digital healthcare—conventional A/B testing yields biased and high-variance estimates of the global average treatment effect (GATE). To address this, we propose the truncated differential Q-estimator (truncated DQ), the first method to theoretically link truncated trajectory differences to policy gradients, proving it yields a first-order unbiased approximation of GATE. Under a nonstationary Markov setting, we derive tight bias–variance bounds. Our approach integrates Bernoulli randomization, truncated time-series analysis, Q-function modeling, and offline policy evaluation. Empirical evaluation on hospital scheduling and ride-hailing simulation benchmarks demonstrates that the calibrated truncated DQ estimator significantly outperforms baseline methods—including direct modeling (DM) and standard differential Q (DQ)—by substantially reducing both estimation bias and variance.

Technology Category

Application Category

📝 Abstract

Randomized controlled experiments (''A/B testing'') are fundamental for assessing interventions in dynamic technology-driven environments, such as recommendation systems, online marketplaces, and digital health interventions. In these systems, interventions typically impact not only the current state of the system, but also future states; therefore, accurate estimation of the global average treatment effect (or GATE) from experiments requires accounting for the dynamic temporal behavior of the system. To address this, recent literature has analyzed a range of estimators applied to Bernoulli randomized experiments in stationary environments, ranging from the standard difference-in-means (DM) estimator to methods building on reinforcement learning techniques, such as off-policy evaluation and the recently proposed difference-in-Q's (DQ) estimator. However, all these estimators exhibit high bias and variance when the environment is nonstationary. This paper addresses the challenge of estimation under nonstationarity. We show that a simple extension of the DM estimator using differences in truncated outcome trajectories yields favorable bias and variance in nonstationary Markovian settings. Our theoretical analysis establishes this result by first showing that the truncated estimator is in fact estimating an appropriate policy gradient that can be expressed as a difference in Q-values; thus we refer to our estimator as the truncated DQ estimator (by analogy to the DQ estimator). We then show that the corresponding policy gradient is a first-order approximation to the GATE. Combining these insights yields our bias and variance bounds. We validate our results through synthetic and realistic simulations-including hospital and ride-sharing settings-and show that a well-calibrated truncated DQ estimator achieves low bias and variance even in nonstationary environments.

Problem

Research questions and friction points this paper is trying to address.

Estimating treatment effects in nonstationary dynamic systems

Reducing bias and variance in nonstationary Markovian settings

Validating truncated DQ estimator for accurate GATE approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Truncated DQ estimator for nonstationarity

Policy gradient as first-order GATE approximation

Validated via synthetic and realistic simulations

🔎 Similar Papers

Testing Stationarity and Change Point Detection in Reinforcement Learning