🤖 AI Summary
This paper investigates the fundamental limits of identifiability for individual treatment effects (ITE) under randomized experiments. Addressing the core challenge that ITE prediction intervals often fail to shrink and exhibit wide bounds, we derive the first sharp bounds on the probability mass function of ITE, rigorously characterizing both the existence conditions and the minimal width of valid predictive intervals. We clarify a critical distinction between ITE prediction intervals and average treatment effect (ATE) confidence intervals: while the latter shrink asymptotically with sample size, the former generally do not contract under partial identification. Furthermore, we unveil the dialectical relationship between Fisher’s and Neyman’s null hypotheses in individual-level inference—rejection of the Neyman null does not preclude the validity of the Fisher null. Our results apply to binary, continuous, and ordinal outcomes, establishing a theoretical benchmark and boundary analysis framework for personalized causal inference.
📝 Abstract
Individual treatment effect (ITE) is often regarded as the ideal target of inference in causal analyses and has been the focus of several recent studies. In this paper, we describe the intrinsic limits regarding what can be learned concerning ITEs given data from large randomized experiments. We consider when a valid prediction interval for the ITE is informative and when it can be bounded away from zero. The joint distribution over potential outcomes is only partially identified from a randomized trial. Consequently, to be valid, an ITE prediction interval must be valid for all joint distribution consistent with the observed data and hence will in general be wider than that resulting from knowledge of this joint distribution. We characterize prediction intervals in the binary treatment and outcome setting, and extend these insights to models with continuous and ordinal outcomes. We derive sharp bounds on the probability mass function (pmf) of the individual treatment effect (ITE). Finally, we contrast prediction intervals for the ITE and confidence intervals for the average treatment effect (ATE). This also leads to the consideration of Fisher versus Neyman null hypotheses. While confidence intervals for the ATE shrink with increasing sample size due to its status as a population parameter, prediction intervals for the ITE generally do not vanish, leading to scenarios where one may reject the Neyman null yet still find evidence consistent with the Fisher null, highlighting the challenges of individualized decision-making under partial identification.