A Markov Decision Process for Variable Selection in Branch & Bound

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Variable selection in branch-and-bound (B&B) for mixed-integer linear programming (MILP) remains a fundamental challenge, with existing heuristics and reinforcement learning (RL) approaches lacking theoretical rigor and generalizability. Method: We propose BBMDP—the first formal, general Markov decision process (MDP) framework for B&B branching, modeling branching decisions as state–action–reward sequences and enabling seamless integration of diverse RL algorithms. Contribution/Results: BBMDP provides a reusable, verifiable MDP paradigm for learning-based branching, grounded in rigorous decision-theoretic principles. Leveraging this framework, we train end-to-end branching agents that outperform state-of-the-art RL baselines across four standard MILP benchmark suites: average solving time decreases by 23.7%, and the number of explored nodes drops by 31.4%. This work establishes a foundational modeling framework for learned MILP solvers, enabling both theoretical analysis and substantial empirical gains.

Technology Category

Application Category

📝 Abstract

Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Developing MDP formulation for variable selection in Branch & Bound

Learning optimal branching policies using reinforcement learning algorithms

Improving performance of MILP solvers through learned heuristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Markov Decision Process for variable selection

Reinforcement learning optimizes branching policies

Outperforms prior RL agents on benchmarks

🔎 Similar Papers

Breiman meets Bellman: Non-Greedy Decision Trees with MDPs