🤖 AI Summary
This work addresses the challenge of unobserved confounding in causal inference under a single-environment setting with nonlinear observations, where hidden confounders can severely undermine reliability. The authors propose Kernel Regression Confounding Detection (KRCD), the first method capable of testing for the presence of confounding in this setting. KRCD models complex dependencies among variables in a reproducing kernel Hilbert space and constructs a test statistic by comparing standard and higher-order kernel regression coefficients. Theoretically, the authors establish that consistency of these regression coefficients is equivalent to the absence of confounding and prove Gaussian convergence of the test statistic under finite-sample conditions. Empirical evaluations on synthetic data and the Twins dataset demonstrate that KRCD significantly outperforms existing approaches in both detection accuracy and computational efficiency.
📝 Abstract
Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environments, limiting applicability to nonlinear single-environment settings. To bridge this gap, we propose Kernel Regression Confounder Detection (KRCD), a novel method for detecting unobserved confounding in nonlinear observational data under single-environment conditions. KRCD leverages reproducing kernel Hilbert spaces to model complex dependencies. By comparing standard and higherorder kernel regressions, we derive a test statistic whose significant deviation from zero indicates unobserved confounding. Theoretically, we prove two key results: First, in infinite samples, regression coefficients coincide if and only if no unobserved confounders exist. Second, finite-sample differences converge to zero-mean Gaussian distributions with tractable variance. Extensive experiments on synthetic benchmarks and the Twins dataset demonstrate that KRCD not only outperforms existing baselines but also achieves superior computational efficiency.