🤖 AI Summary
This paper addresses the frequent failure of kernel conditional independence (KCI) tests in practice by systematically analyzing their finite-sample behavior. We identify that estimation error in conditional mean embeddings is the primary source of Type-I errors, and—crucially—demonstrate that the choice of conditional kernel, previously overlooked, significantly affects both statistical power and false positive rate. Furthermore, we prove that the generalized covariance measure (GCM) is a specific approximation of KCI under a particular kernel. Grounded in reproducing kernel Hilbert space (RKHS) theory and statistical hypothesis testing, our work provides the first theoretical characterization of how conditional kernel selection exerts a dual regulatory effect on CI test performance. These findings yield both theoretical foundations and actionable guidelines for reliable CI testing in causal discovery, algorithmic fairness assessment, and out-of-distribution robustness verification. (149 words)
📝 Abstract
Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics, from causal discovery to evaluation of predictor fairness and out-of-distribution robustness. Shah and Peters (2020) showed that, contrary to the unconditional case, no universally finite-sample valid test can ever achieve nontrivial power. While informative, this result (based on "hiding" dependence) does not seem to explain the frequent practical failures observed with popular CI tests. We investigate the Kernel-based Conditional Independence (KCI) test - of which we show the Generalized Covariance Measure underlying many recent tests is nearly a special case - and identify the major factors underlying its practical behavior. We highlight the key role of errors in the conditional mean embedding estimate for the Type-I error, while pointing out the importance of selecting an appropriate conditioning kernel (not recognized in previous work) as being necessary for good test power but also tending to inflate Type-I error.