🤖 AI Summary
Variable selection in branch-and-bound (B&B) for mixed-integer linear programming (MILP) remains a fundamental challenge, with existing heuristics and reinforcement learning (RL) approaches lacking theoretical rigor and generalizability.
Method: We propose BBMDP—the first formal, general Markov decision process (MDP) framework for B&B branching, modeling branching decisions as state–action–reward sequences and enabling seamless integration of diverse RL algorithms.
Contribution/Results: BBMDP provides a reusable, verifiable MDP paradigm for learning-based branching, grounded in rigorous decision-theoretic principles. Leveraging this framework, we train end-to-end branching agents that outperform state-of-the-art RL baselines across four standard MILP benchmark suites: average solving time decreases by 23.7%, and the number of explored nodes drops by 31.4%. This work establishes a foundational modeling framework for learned MILP solvers, enabling both theoretical analysis and substantial empirical gains.
📝 Abstract
Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.