Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Integer programming (IP) approaches to Bayesian network structure learning (BNSL) suffer from computational bottlenecks in the pricing problem due to exponential growth in variables and constraints. Method: We propose a dynamic optimization framework based on row/column generation, the first to formulate the BNSL pricing problem as a difference-submodular optimization task and solve it efficiently via an inexact Difference of Convex Algorithm (DCA), circumventing the complexity limitations of exact pricing. Our method integrates ℓ₀-penalized likelihood scoring with column generation. Results: On Gaussian continuous data, it significantly improves solution quality—particularly for high-density graphs—outperforming mainstream scoring methods; on large-scale graphs, it matches state-of-the-art constrained and hybrid approaches. This work establishes a new scalable paradigm for BNSL.

Technology Category

Application Category

📝 Abstract
In this paper, we consider a score-based Integer Programming (IP) approach for solving the Bayesian Network Structure Learning (BNSL) problem. State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and constraints. A standard approach in IP to address such challenges is to employ row and column generation techniques, which dynamically generate rows and columns, while the complex pricing problem remains a computational bottleneck for BNSL. For the general class of $ell_0$-penalized likelihood scores, we show how the pricing problem can be reformulated as a difference of submodular optimization problem, and how the Difference of Convex Algorithm (DCA) can be applied as an inexact method to efficiently solve the pricing problems. Empirically, we show that, for continuous Gaussian data, our row and column generation approach yields solutions with higher quality than state-of-the-art score-based approaches, especially when the graph density increases, and achieves comparable performance against benchmark constraint-based and hybrid approaches, even when the graph size increases.
Problem

Research questions and friction points this paper is trying to address.

Addresses exponentially large variables in BNSL IP formulations
Reformulates pricing as difference-of-submodular optimization problem
Improves solution quality for high-density graphs vs state-of-the-art
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses column generation for Bayesian Network learning
Reformulates pricing as submodular optimization problem
Applies Difference of Convex Algorithm (DCA)
🔎 Similar Papers
No similar papers found.
Yiran Yang
Yiran Yang
University of Chinese Academy of Sciences
Object detection、 AIGC、Knowledge Distillation
R
Rui Chen
School of Data Science, The Chinese University of Hong Kong, Shenzhen