🤖 AI Summary
This paper addresses the problem of learning directed acyclic graphs (DAGs) from data generated by nonlinear additive noise models (ANMs) with Gaussian noise. We propose a convex mixed-integer programming method based on basis function expansion and group ℓ₀ regularization, enabling explicit control of edge sparsity and seamless integration of structural prior knowledge. Theoretically, we establish statistical consistency and optimization convergence guarantees, and support early stopping as well as verifiably optimal solutions. Leveraging maximum likelihood estimation, branch-and-bound, and optimality gap analysis, we derive tight statistical error bounds. Experiments demonstrate that our approach significantly outperforms state-of-the-art DAG learning algorithms on both synthetic and real-world high-dimensional datasets. To the best of our knowledge, this is the first method to achieve consistent graph structure recovery under nonlinear ANMs while providing verifiable optimality within a user-specified precision.
📝 Abstract
We study the problem of learning a directed acyclic graph from data generated according to an additive, non-linear structural equation model with Gaussian noise. We express each non-linear function through a basis expansion, and derive a maximum likelihood estimator with a group l0-regularization that penalizes the number of edges in the graph. The resulting estimator is formulated through a convex mixed-integer program, enabling the use of branch-and-bound methods to obtain a solution that is guaranteed to be accurate up to a pre-specified optimality gap. Our formulation can naturally encode background knowledge, such as the presence or absence of edges and partial order constraints among the variables. We establish consistency guarantees for our estimator in terms of graph recovery, even when the number of variables grows with the sample size. Additionally, by connecting the optimality guarantees with our statistical error bounds, we derive an early stopping criterion that allows terminating the branch-and-bound procedure while preserving consistency. Compared with existing approaches that either assume equal error variances, restrict to linear structural equation models, or rely on heuristic procedures, our method enjoys both optimization and statistical guarantees. Extensive simulations and real-data analysis show that the proposed method achieves markedly better graph recovery performance.