Long-Term Causal Inference with Many Noisy Proxies

📅 2026-01-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the challenge of estimating long-term causal effects in digital platform experiments, where such effects are often indirectly inferred through numerous noisy short-term proxy variables that reflect a low-dimensional latent mediator. The authors formulate this as a latent variable estimation problem and propose using regularized regression methods—such as ridge regression—to effectively distill information from high-dimensional proxies. Theoretical analysis reveals that ridge regression exhibits diminishing bias as the number of proxies increases and yields a closed-form solution for the bias–variance trade-off, thereby overcoming limitations of conventional proxy selection approaches. Empirical evaluations on both simulated data and the California GAIN experiment demonstrate that the proposed method substantially outperforms naive proxy selection strategies in accurately estimating long-term treatment effects.

Technology Category

Application Category

📝 Abstract

We propose a method for estimating long-term treatment effects with many short-term proxy outcomes: a central challenge when experimenting on digital platforms. We formalize this challenge as a latent variable problem where observed proxies are noisy measures of a low-dimensional set of unobserved surrogates that mediate treatment effects. Through theoretical analysis and simulations, we demonstrate that regularized regression methods substantially outperform naive proxy selection. We show in particular that the bias of Ridge regression decreases as more proxies are added, with closed-form expressions for the bias-variance tradeoff. We illustrate our method with an empirical application to the California GAIN experiment.

Problem

Research questions and friction points this paper is trying to address.

long-term causal inference

noisy proxies

treatment effects

latent variable

proxy outcomes

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-term causal inference

noisy proxies

latent variable model