🤖 AI Summary
Deep neural networks excel at prediction but suffer from substantial bias and high computational cost in unbiased, efficient inference of causal parameters—such as average treatment effects or survival curves. To address this, we propose Targeted Deep Architectures (TDA), the first framework to directly embed Targeted Maximum Likelihood Estimation (TMLE) into neural network parameter updates. TDA employs a parameter-splitting mechanism: it freezes the backbone weights while iteratively updating only a small-scale “target” subparameter along the doubly robust influence function direction. The method is architecture-agnostic and unifies debiased estimation and asymptotically efficient inference for multivariate causal parameters. Evaluated on the Infant Health and Development Program (IHDP) benchmark and survival data with informative censoring, TDA significantly reduces estimation bias and improves confidence interval coverage. It achieves theoretical rigor—guaranteeing asymptotic normality and semiparametric efficiency—while maintaining computational scalability.
📝 Abstract
Modern deep neural networks are powerful predictive tools yet often lack valid inference for causal parameters, such as treatment effects or entire survival curves. While frameworks like Double Machine Learning (DML) and Targeted Maximum Likelihood Estimation (TMLE) can debias machine-learning fits, existing neural implementations either rely on "targeted losses" that do not guarantee solving the efficient influence function equation or computationally expensive post-hoc "fluctuations" for multi-parameter settings. We propose Targeted Deep Architectures (TDA), a new framework that embeds TMLE directly into the network's parameter space with no restrictions on the backbone architecture. Specifically, TDA partitions model parameters - freezing all but a small "targeting" subset - and iteratively updates them along a targeting gradient, derived from projecting the influence functions onto the span of the gradients of the loss with respect to weights. This procedure yields plug-in estimates that remove first-order bias and produce asymptotically valid confidence intervals. Crucially, TDA easily extends to multi-dimensional causal estimands (e.g., entire survival curves) by merging separate targeting gradients into a single universal targeting update. Theoretically, TDA inherits classical TMLE properties, including double robustness and semiparametric efficiency. Empirically, on the benchmark IHDP dataset (average treatment effects) and simulated survival data with informative censoring, TDA reduces bias and improves coverage relative to both standard neural-network estimators and prior post-hoc approaches. In doing so, TDA establishes a direct, scalable pathway toward rigorous causal inference within modern deep architectures for complex multi-parameter targets.