π€ AI Summary
To address the scalability limitations of causal graph learning for high-dimensional variables (hundreds) and large-scale samples (thousands) under nonlinear, continuous, or mixed data, this paper introduces two novel tools: the BF-BIC score and the BF-LRT conditional independence testβboth based on truncated basis function expansions. We pioneer the integration of truncated additive models with invertible reparameterization to enable robust causal discovery under post-nonlinear models. Discrete variables are uniformly handled via degenerate Gaussian embeddings, enabling an efficient hybrid search framework. Theoretical analysis guarantees consistency, while computational complexity is substantially reduced. Empirical evaluation on neural causal simulations demonstrates superior accuracy and efficiency over state-of-the-art methods including KCI and RFCI. The approach is successfully applied to Canadian wildfire risk modeling, validating its practical utility in real-world complex systems.
π Abstract
Learning graphical conditional independence structures from nonlinear, continuous or mixed data is a central challenge in machine learning and the sciences, and many existing methods struggle to scale to thousands of samples or hundreds of variables. We introduce two basis-expansion tools for scalable causal discovery. First, the Basis Function BIC (BF-BIC) score uses truncated additive expansions to approximate nonlinear dependencies. BF-BIC is theoretically consistent under additive models and extends to post-nonlinear (PNL) models via an invertible reparameterization. It remains robust under moderate interactions and supports mixed data through a degenerate-Gaussian embedding for discrete variables. In simulations with fully nonlinear neural causal models (NCMs), BF-BIC outperforms kernel- and constraint-based methods (e.g., KCI, RFCI) in both accuracy and runtime. Second, the Basis Function Likelihood Ratio Test (BF-LRT) provides an approximate conditional independence test that is substantially faster than kernel tests while retaining competitive accuracy. Extensive simulations and a real-data application to Canadian wildfire risk show that, when integrated into hybrid searches, BF-based methods enable interpretable and scalable causal discovery. Implementations are available in Python, R, and Java.