🤖 AI Summary
This work addresses the causal discovery problem of exactly recovering the directed acyclic graph (DAG) structure from observational data under linear Gaussian models. We propose BUILD, a bottom-up deterministic algorithm. First, we characterize structural properties of the precision matrix under equal-variance linear structural equation models (SEMs), enabling provably exact layerwise leaf-node identification and pruning. Second, we introduce a dynamic re-estimation strategy to enhance finite-sample robustness. BUILD integrates precision matrix analysis, graph-theoretic pruning, regularized inverse covariance estimation, and conditional independence testing. Theoretically, it guarantees exact DAG recovery with explicitly bounded time complexity. Empirically, on synthetic benchmarks, BUILD significantly outperforms state-of-the-art methods in reconstruction accuracy and demonstrates superior robustness to noise and high-dimensional scaling.
📝 Abstract
Learning the structure of directed acyclic graphs (DAGs) from observational data is a central problem in causal discovery, statistical signal processing, and machine learning. Under a linear Gaussian structural equation model (SEM) with equal noise variances, the problem is identifiable and we show that the ensemble precision matrix of the observations exhibits a distinctive structure that facilitates DAG recovery. Exploiting this property, we propose BUILD (Bottom-Up Inference of Linear DAGs), a deterministic stepwise algorithm that identifies leaf nodes and their parents, then prunes the leaves by removing incident edges to proceed to the next step, exactly reconstructing the DAG from the true precision matrix. In practice, precision matrices must be estimated from finite data, and ill-conditioning may lead to error accumulation across BUILD steps. As a mitigation strategy, we periodically re-estimate the precision matrix (with less variables as leaves are pruned), trading off runtime for enhanced robustness. Reproducible results on challenging synthetic benchmarks demonstrate that BUILD compares favorably to state-of-the-art DAG learning algorithms, while offering an explicit handle on complexity.