Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Traditional A/B testing, constrained by short observation windows, often fails to accurately capture the long-term impact of interventions on user retention and lifetime value, leading to potential misjudgments. This work proposes a framework leveraging multiple short-term experimental cohorts, incorporating user learning dynamics through a parametric decay model to infer Long-Term Effects (LTE) and changes in Expected Remaining Lifetime Value (ΔERLV). The approach innovatively employs an inverse-variance-weighted estimator to integrate data across cohorts, enabling simultaneous estimation of steady-state impacts and cumulative value. By doing so, it not only enhances estimation precision but also demonstrates how reliance solely on either short-term or long-term metrics can result in suboptimal or erroneous decisions.

Technology Category

Application Category

📝 Abstract

In streaming platforms churn is extremely costly, yet A/B tests are typically evaluated using outcomes observed within a limited experimental horizon. Even when both short- and predicted long-term engagement metrics are considered, they may fail to capture how a treatment affects users' retention. Consequently, an intervention may appear beneficial in the short term and neutral in the long term while still generating lower total value than the control due to users churn. To address this limitation, we introduce a method that estimates long-term treatment effects (LTE) and residual lifetime value change ($ΔERLV$) in short multi-cohort A/B tests under user learning. To estimate time-varying treatment effects efficiently, we introduce an inverse-variance weighted estimator that combines multiple cohorts estimates, reducing variance relative to standard approaches in the literature. The estimated treatment trajectory is then modeled as a parametric decay to recover both the asymptotic treatment effect and the cumulative value generated over time. Our framework enables simultaneous evaluation of steady-state impact and residual user value within a single experiment. Empirical results show improved precision in estimating LTE and $ΔERLV$ and identify scenarios in which relying on either short-term or long-term metrics alone would lead to incorrect product decisions.

Problem

Research questions and friction points this paper is trying to address.

long-term treatment effects

lifetime value

user retention

A/B testing

user learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-term treatment effects

multi-cohort A/B testing

inverse-variance weighting