🤖 AI Summary
Conditional independence (CI) testing is fundamental to causal inference and graphical modeling, yet conventional methods often rely on restrictive structural assumptions, limiting practical performance. While kernel-based partial covariance operator approaches offer greater principled grounding, they suffer from poor adaptivity, slow convergence, and limited scalability. This paper introduces a learnable representation framework grounded in spectral decomposition of the partial covariance operator—marking the first deep integration of spectral representation learning with CI testing. We establish a theoretical mapping from representation error to statistical power and propose a two-level contrastive learning algorithm that ensures statistical validity while enhancing computational efficiency. We prove asymptotic validity and controllable power for the resulting test. Experiments demonstrate that our method significantly outperforms existing kernel-based approaches on high-dimensional real-world data, achieving superior robustness, scalability, and statistical reliability.
📝 Abstract
Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.