🤖 AI Summary
Addressing the dual challenges of inflated Type I error rates (loss of test-level control) and low statistical power in conditional independence testing, this paper proposes a data-efficient kernel-based testing framework. The method employs kernel ridge regression and introduces, for the first time in this setting, three principled bias-correction strategies: data splitting, auxiliary data utilization, and restriction to simplified function classes—ensuring rigorous asymptotic and finite-sample control of the significance level. Theoretically, the approach guarantees convergence of the Type I error rate to the nominal significance level while enhancing detection power for complex dependency structures. Extensive experiments on diverse synthetic and real-world datasets demonstrate that the proposed method achieves precise Type I error control and substantially outperforms state-of-the-art competitors—including KCIT and RCIT—in statistical power, with improved robustness and reliability.
📝 Abstract
We describe a data-efficient, kernel-based approach to statistical testing of conditional independence. A major challenge of conditional independence testing, absent in tests of unconditional independence, is to obtain the correct test level (the specified upper bound on the rate of false positives), while still attaining competitive test power. Excess false positives arise due to bias in the test statistic, which is obtained using nonparametric kernel ridge regression. We propose three methods for bias control to correct the test level, based on data splitting, auxiliary data, and (where possible) simpler function classes. We show these combined strategies are effective both for synthetic and real-world data.