🤖 AI Summary
This work addresses the high computational cost and poor calibration of existing kernel-based conditional independence tests, which often rely on kernel ridge regression. The authors propose a regression-agnostic kernel test that models variables in a reproducing kernel Hilbert space and assesses conditional independence by testing the marginal independence of regression residuals. This approach is compatible with arbitrary regression estimators—including tree-based models—thereby overcoming the limitations imposed by dependence on specific regression forms. Built upon a generalized Hilbert–Schmidt independence criterion framework, the method provides asymptotically valid significance levels with guaranteed consistency. Empirical evaluations demonstrate that the proposed test achieves more accurate Type I error control across diverse data-generating mechanisms while exhibiting statistical power comparable to or better than state-of-the-art methods.
📝 Abstract
We consider the problem of conditional independence (CI) testing and adopt a kernel-based approach. Kernel-based CI tests embed variables in reproducing kernel Hilbert spaces, regress their embeddings on the conditioning variables, and test the resulting residuals for marginal independence. This approach yields tests that are sensitive to a broad range of conditional dependencies. Existing methods, however, rely heavily on kernel ridge regression, which is computationally expensive when properly tuned and yields poorly calibrated tests when left untuned, which limits their practical usefulness. We propose the Generalised Kernel Covariance Measure (GKCM), a regression-model-agnostic kernel-based CI test that accommodates a broad class of regression estimators. Building on the Generalised Hilbertian Covariance Measure framework (Lundborg et al., 2022), we characterise conditions under which GKCM satisfies uniform asymptotic level guarantees. In simulations, GKCM paired with tree-based regression models frequently outperforms state-of-the-art CI tests across a diverse range of data-generating processes, achieving better type I error control and competitive or superior power.