π€ AI Summary
In additive noise models, regression functions estimated by machine learning often induce spurious dependence between residuals and covariates, compromising the validity of downstream inference. This work proposes the first semiparametrically efficient inference method tailored to kernel-based heteroskedasticity, constructing a Hilbert spaceβvalued one-step estimator for the kernel covariance operator between covariates and residuals. Coupled with a bootstrap calibration procedure, the approach enables valid tests for residual independence and model goodness-of-fit. The method accommodates settings with additional covariates, supports efficient inference on heterogeneity in residual noise distributions across treatment groups, and yields asymptotically valid confidence intervals. Simulations demonstrate that, compared to naive plug-in residual methods, the proposed approach achieves substantially improved calibration and statistical power.
π Abstract
We develop semiparametrically efficient inference for kernel measures of noise heterogeneity in additive noise models. In many applications, the regression function is estimated using flexible machine learning methods. Downstream procedures based on the resulting residuals can then inherit first-stage bias: regression error may induce spurious dependence between covariates and residuals, invalidating the assumptions needed for standard analysis. We construct a novel Hilbert-valued one-step estimator of the kernel covariance operator between covariates and residuals. Our estimator yields bootstrap-calibrated tests for residual independence and goodness of fit in additive noise models, while also providing asymptotically efficient confidence intervals for the kernel dependence measure under noise heterogeneity. The framework extends to settings with additional covariates, enabling inference on distributional heterogeneity of residual noise across treatment groups. Simulations show improved calibration and power relative to naive plug-in residual methods.