🤖 AI Summary
Estimating distributional treatment effects (DTE) in randomized experiments faces two key challenges: low accuracy in distributional tails and poor computational scalability on large-scale data. To address these, we propose an end-to-end DTE estimation framework based on a multi-task neural network. Our method jointly incorporates monotonicity shape constraints and multi-threshold label learning, formulating DTE estimation as an ordinal-constrained multi-class classification problem. This design enhances stability in modeling conditional outcome distributions while significantly improving tail sensitivity. Unlike conventional regression-adjustment approaches—which require numerous independent regressions—our framework avoids redundant fitting, yielding superior scalability. Extensive evaluations on synthetic data and real-world applications (a water conservation intervention and an A/B test on a Japanese streaming platform) demonstrate that our method consistently outperforms existing baselines in overall DTE accuracy, tail coverage, and computational efficiency—establishing its readiness for industrial-scale causal inference deployment.
📝 Abstract
We propose a novel multi-task neural network approach for estimating distributional treatment effects (DTE) in randomized experiments. While DTE provides more granular insights into the experiment outcomes over conventional methods focusing on the Average Treatment Effect (ATE), estimating it with regression adjustment methods presents significant challenges. Specifically, precision in the distribution tails suffers due to data imbalance, and computational inefficiencies arise from the need to solve numerous regression problems, particularly in large-scale datasets commonly encountered in industry. To address these limitations, our method leverages multi-task neural networks to estimate conditional outcome distributions while incorporating monotonic shape constraints and multi-threshold label learning to enhance accuracy. To demonstrate the practical effectiveness of our proposed method, we apply our method to both simulated and real-world datasets, including a randomized field experiment aimed at reducing water consumption in the US and a large-scale A/B test from a leading streaming platform in Japan. The experimental results consistently demonstrate superior performance across various datasets, establishing our method as a robust and practical solution for modern causal inference applications requiring a detailed understanding of treatment effect heterogeneity.