🤖 AI Summary
This paper investigates how treatment noise distribution affects the estimation accuracy of the average causal effect (ACE) in structure-agnostic causal inference. We find that double machine learning (DML) achieves optimal convergence rates under Gaussian treatment noise but suffers suboptimal rates under non-Gaussian noise due to its lack of higher-order orthogonality. To address this, we propose a novel cumulant-driven ACE estimator: it constructs higher-order orthogonal moment conditions based on the *r*-th cumulant of the treatment noise, rendering the estimator *r*-th order insensitive to nuisance function estimation errors; it accommodates both continuous and binary treatments; and we establish theoretically that it attains faster convergence rates under non-Gaussian noise. Through minimax analysis and synthetic demand experiments, we demonstrate that our method significantly outperforms DML—particularly in skewed or heavy-tailed noise settings—yielding improved robustness and accuracy in ACE estimation.
📝 Abstract
Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on treatment and outcomes). Here, we find that the answer depends in a surprising way on the distribution of the treatment noise. Focusing on the partially linear model of citet{robinson1988root}, we first show that the widely adopted double machine learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise, resolving an open problem of citet{mackey2018orthogonal}. Meanwhile, for independent non-Gaussian treatment noise, we show that DML is always suboptimal by constructing new practical procedures with higher-order robustness to nuisance errors. These emph{ACE} procedures use structure-agnostic cumulant estimators to achieve $r$-th order insensitivity to nuisance errors whenever the $(r+1)$-st treatment cumulant is non-zero. We complement these core results with novel minimax guarantees for binary treatments in the partially linear model. Finally, using synthetic demand estimation experiments, we demonstrate the practical benefits of our higher-order robust estimators.