🤖 AI Summary
This paper addresses ecological inference: estimating group-level means of individual outcomes using only aggregate-level data (e.g., regional averages). Existing methods rely on strong aggregation assumptions, yielding fragile identification conditions. We formally characterize weaker, more plausible identification conditions and propose a debiased machine learning estimator grounded in a partially linear structure. Our method accommodates multiple covariates, enables semiparametric sensitivity analysis, and supports asymptotically efficient inference for local individual effects. Integrating ecological inference, debiased ML, semiparametric modeling, and high-dimensional statistical inference, it delivers robust estimation of average treatment effects. Simulation studies and empirical applications demonstrate superior performance over leading alternatives. An open-source software implementation is provided.
📝 Abstract
We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification, which motivates estimators that can efficiently control for many covariates. We propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator also admits a semiparametric sensitivity analysis for violations of the key identifying assumption, as well as asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.