Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses ecological inference: estimating group-level means of individual outcomes using only aggregate-level data (e.g., regional averages). Existing methods rely on strong aggregation assumptions, yielding fragile identification conditions. We formally characterize weaker, more plausible identification conditions and propose a debiased machine learning estimator grounded in a partially linear structure. Our method accommodates multiple covariates, enables semiparametric sensitivity analysis, and supports asymptotically efficient inference for local individual effects. Integrating ecological inference, debiased ML, semiparametric modeling, and high-dimensional statistical inference, it delivers robust estimation of average treatment effects. Simulation studies and empirical applications demonstrate superior performance over leading alternatives. An open-source software implementation is provided.

Technology Category

Application Category

📝 Abstract
We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification, which motivates estimators that can efficiently control for many covariates. We propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator also admits a semiparametric sensitivity analysis for violations of the key identifying assumption, as well as asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.
Problem

Research questions and friction points this paper is trying to address.

Estimating group-level outcome means from aggregated data
Addressing ecological inference with weaker identification assumptions
Developing semiparametric estimators for conditional means
Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiased machine learning with partially linear form
Semiparametric sensitivity analysis for assumption violations
Asymptotically valid confidence intervals for unit estimates
🔎 Similar Papers
No similar papers found.