🤖 AI Summary
This paper addresses Bayesian network structure learning under linear Gaussian structural equation models (SEMs), focusing on the computationally challenging ℓ₀-regularized maximum likelihood estimation.
Method: We propose the first coordinate descent algorithm specifically designed for this task, incorporating both asymptotic optimality and finite-sample statistical consistency guarantees—overcoming the longstanding theoretical gap in nonconvex optimization for SEM structure learning. The algorithm converges to coordinate-wise stationary points, and its objective value approaches the global optimum as sample size increases.
Results: Extensive experiments on synthetic and real-world datasets demonstrate near-optimal solution quality and strong scalability. Our method significantly improves learning efficiency and reliability for medium-scale networks, outperforming existing approaches in both accuracy and computational tractability. Theoretical and empirical results jointly establish it as a principled, scalable, and statistically sound framework for ℓ₀-regularized SEM learning.
📝 Abstract
This paper studies the problem of learning Bayesian networks from continuous observational data, generated according to a linear Gaussian structural equation model. We consider an $ell_0$-penalized maximum likelihood estimator for this problem which is known to have favorable statistical properties but is computationally challenging to solve, especially for medium-sized Bayesian networks. We propose a new coordinate descent algorithm to approximate this estimator and prove several remarkable properties of our procedure: the algorithm converges to a coordinate-wise minimum, and despite the non-convexity of the loss function, as the sample size tends to infinity, the objective value of the coordinate descent solution converges to the optimal objective value of the $ell_0$-penalized maximum likelihood estimator. Finite-sample statistical consistency guarantees are also established. To the best of our knowledge, our proposal is the first coordinate descent procedure endowed with optimality and statistical guarantees in the context of learning Bayesian networks. Numerical experiments on synthetic and real data demonstrate that our coordinate descent method can obtain near-optimal solutions while being scalable.