Scalable Causal Discovery from Recursive Nonlinear Data via Truncated Basis Function Scores and Tests

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

To address the scalability limitations of causal graph learning for high-dimensional variables (hundreds) and large-scale samples (thousands) under nonlinear, continuous, or mixed data, this paper introduces two novel tools: the BF-BIC score and the BF-LRT conditional independence test—both based on truncated basis function expansions. We pioneer the integration of truncated additive models with invertible reparameterization to enable robust causal discovery under post-nonlinear models. Discrete variables are uniformly handled via degenerate Gaussian embeddings, enabling an efficient hybrid search framework. Theoretical analysis guarantees consistency, while computational complexity is substantially reduced. Empirical evaluation on neural causal simulations demonstrates superior accuracy and efficiency over state-of-the-art methods including KCI and RFCI. The approach is successfully applied to Canadian wildfire risk modeling, validating its practical utility in real-world complex systems.

Technology Category

Application Category

📝 Abstract

Learning graphical conditional independence structures from nonlinear, continuous or mixed data is a central challenge in machine learning and the sciences, and many existing methods struggle to scale to thousands of samples or hundreds of variables. We introduce two basis-expansion tools for scalable causal discovery. First, the Basis Function BIC (BF-BIC) score uses truncated additive expansions to approximate nonlinear dependencies. BF-BIC is theoretically consistent under additive models and extends to post-nonlinear (PNL) models via an invertible reparameterization. It remains robust under moderate interactions and supports mixed data through a degenerate-Gaussian embedding for discrete variables. In simulations with fully nonlinear neural causal models (NCMs), BF-BIC outperforms kernel- and constraint-based methods (e.g., KCI, RFCI) in both accuracy and runtime. Second, the Basis Function Likelihood Ratio Test (BF-LRT) provides an approximate conditional independence test that is substantially faster than kernel tests while retaining competitive accuracy. Extensive simulations and a real-data application to Canadian wildfire risk show that, when integrated into hybrid searches, BF-based methods enable interpretable and scalable causal discovery. Implementations are available in Python, R, and Java.

Problem

Research questions and friction points this paper is trying to address.

Scalable causal discovery from nonlinear mixed data with thousands of samples

Learning graphical conditional independence structures from continuous or mixed variables

Developing efficient methods that outperform kernel-based approaches in accuracy and runtime

Innovation

Methods, ideas, or system contributions that make the work stand out.

BF-BIC score uses truncated expansions for nonlinear dependencies

BF-LRT test provides fast approximate conditional independence

Methods support mixed data via degenerate-Gaussian embedding

🔎 Similar Papers

Hybrid Top-Down Global Causal Discovery with Local Search for Linear and Nonlinear Additive Noise Models

2024-05-23Citations: 2

Causal discovery from conditionally stationary time-series

2021-10-12arXiv.orgCitations: 5

💼 Related Jobs

Research Engineer, Monetization AI