A Markov Decision Process for Variable Selection in Branch & Bound

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Variable selection in branch-and-bound (B&B) for mixed-integer linear programming (MILP) remains a fundamental challenge, with existing heuristics and reinforcement learning (RL) approaches lacking theoretical rigor and generalizability. Method: We propose BBMDP—the first formal, general Markov decision process (MDP) framework for B&B branching, modeling branching decisions as state–action–reward sequences and enabling seamless integration of diverse RL algorithms. Contribution/Results: BBMDP provides a reusable, verifiable MDP paradigm for learning-based branching, grounded in rigorous decision-theoretic principles. Leveraging this framework, we train end-to-end branching agents that outperform state-of-the-art RL baselines across four standard MILP benchmark suites: average solving time decreases by 23.7%, and the number of explored nodes drops by 31.4%. This work establishes a foundational modeling framework for learned MILP solvers, enabling both theoretical analysis and substantial empirical gains.

Technology Category

Application Category

📝 Abstract
Mixed-Integer Linear Programming (MILP) is a powerful framework used to address a wide range of NP-hard combinatorial optimization problems, often solved by Branch and Bound (B&B). A key factor influencing the performance of B&B solvers is the variable selection heuristic governing branching decisions. Recent contributions have sought to adapt reinforcement learning (RL) algorithms to the B&B setting to learn optimal branching policies, through Markov Decision Processes (MDP) inspired formulations, and ad hoc convergence theorems and algorithms. In this work, we introduce BBMDP, a principled vanilla MDP formulation for variable selection in B&B, allowing to leverage a broad range of RL algorithms for the purpose of learning optimal B&B heuristics. Computational experiments validate our model empirically, as our branching agent outperforms prior state-of-the-art RL agents on four standard MILP benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Developing MDP formulation for variable selection in Branch & Bound
Learning optimal branching policies using reinforcement learning algorithms
Improving performance of MILP solvers through learned heuristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Markov Decision Process for variable selection
Reinforcement learning optimizes branching policies
Outperforms prior RL agents on benchmarks
🔎 Similar Papers
No similar papers found.
P
Paul Strang
EDF R&D, France CNAM Paris, France
Z
Zacharie Alès
ENSTA IP Paris, France CNAM Paris, France
C
Côme Bissuel
EDF R&D, France
O
Olivier Juan
EDF R&D, France
Safia Kedad-Sidhoum
Safia Kedad-Sidhoum
Conservatoire National des Arts et Métiers
Operations ResearchSchedulingLot-sizingCombinatorial Optimization
Emmanuel Rachelson
Emmanuel Rachelson
ISAE-SUPAERO
Artificial IntelligenceMachine LearningReinforcement Learning