🤖 AI Summary
Traditional continuous causal discovery relies on non-convex acyclicity constraints, often leading to local optima and poor scalability in high dimensions. This work proposes a decoupled paradigm that shifts the core of causal discovery from structural optimization to score function estimation. It reveals, for the first time, the geometric signature of causal hierarchy within the score function and introduces Score-Schur topological sorting to directly extract causal order from unconstrained generative models. Acyclicity is reformulated as an algebraic procedure based on Schur complements, thereby circumventing explicit constrained optimization. Furthermore, graph marginalization is achieved via the Schur complement of the score Jacobian information matrix, and Block-SSTS compression is employed to handle nonlinear systems. Experiments demonstrate that the method efficiently learns causal structures on nonlinear graphs with up to $d=1000$ variables, with accuracy primarily limited by the finite-sample estimation variance of the score geometry.
📝 Abstract
Continuous causal discovery typically couples representation learning with structural optimization via non-convex acyclicity penalties, which subjects solvers to local optima and restricts scalability in high-dimensional regimes. We propose a decoupled paradigm that shifts the causal discovery bottleneck from non-convex optimization to statistical score estimation. We introduce the Score-Schur Topological Sort (SSTS), an algorithm that extracts topological order directly from unconstrained generative models, bypassing constrained structure optimization. We establish that the causal hierarchy leaves a geometric signature within the score function: iterative graph marginalization is mathematically equivalent to computing the Schur complement of the Score-Jacobian Information Matrix (SJIM) under linear conditions. This translates the acyclicity constraint into an algebraic procedure with a dominant cost of O(d^3) operations. For non-linear systems, we formulate the expectation gap of Schur marginalization and introduce Block-SSTS to compress extraction depth, bounding structural error. Empirically, SSTS allows causal structural analysis on non-linear graphs up to d=1000. At this scale, our framework indicates that once the non-convex optimization bottleneck is mathematically bypassed, the structural fidelity of continuous causal discovery is bounded by the finite-sample estimation variance of the global score geometry. By reducing graph extraction to matrix operations, this work reframes scalable causal discovery from a constrained optimization problem to a statistical estimation challenge.