Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the computational intractability of bootstrap-based uncertainty quantification for kernel methods in large-scale causal inference. We propose the first extension of the causal bag of little bootstraps (cBLB) to kernel methods, integrating subsampling and resampling strategies to dramatically improve computational efficiency while preserving first-order asymptotic validity. The approach is applied to kernelized augmented outcome weighting, kernel minimax weighting, and kernel SVM double machine learning frameworks. Empirical evaluations on both simulated data and the National Vital Statistics System birth records—comprising over 3.5 million observations—demonstrate that our method yields confidence intervals with near-nominal coverage at minimal computational cost. This enables accurate estimation of the causal effect of maternal smoking on infant birth weight and facilitates identification of optimal intervention policies.

Technology Category

Application Category

📝 Abstract
Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale.
Problem

Research questions and friction points this paper is trying to address.

uncertainty quantification
kernel methods
causal inference
bootstrap
large-scale data
Innovation

Methods, ideas, or system contributions that make the work stand out.

kernel methods
causal inference
uncertainty quantification
Bag of Little Bootstraps
scalable bootstrap
M
Matthew Kosko
Department of Population Health, New York University, New York, NY, 10016
F
Falco J. Bargagli-Stoffi
Department of Biostatistics, University of California, Los Angeles, CA
Lin Wang
Lin Wang
Assistant Professor of Statistics, Purdue University
Experimental designSamplingCausal inference
Michele Santacatterina
Michele Santacatterina
NYU Grossman School of Medicine
BiostatisticsCausal InferenceData ScienceHealthcareReal-World Data