Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study addresses the computational intractability of bootstrap-based uncertainty quantification for kernel methods in large-scale causal inference. We propose the first extension of the causal bag of little bootstraps (cBLB) to kernel methods, integrating subsampling and resampling strategies to dramatically improve computational efficiency while preserving first-order asymptotic validity. The approach is applied to kernelized augmented outcome weighting, kernel minimax weighting, and kernel SVM double machine learning frameworks. Empirical evaluations on both simulated data and the National Vital Statistics System birth records—comprising over 3.5 million observations—demonstrate that our method yields confidence intervals with near-nominal coverage at minimal computational cost. This enables accurate estimation of the causal effect of maternal smoking on infant birth weight and facilitates identification of optimal intervention policies.

Technology Category

Application Category

📝 Abstract

Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale.

Problem

Research questions and friction points this paper is trying to address.

uncertainty quantification

kernel methods

causal inference

bootstrap

large-scale data

Innovation

Methods, ideas, or system contributions that make the work stand out.

kernel methods

causal inference

uncertainty quantification