Learning Branching Policies for MILPs with Proximal Policy Optimization

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learning-based branching policies for mixed-integer linear programming (MILP) exhibit poor generalization—particularly on structurally diverse or out-of-distribution (OOD) instances—due to reliance on imitation learning and static state representations. Method: We propose Tree-Gate PPO, a reinforcement learning framework that replaces imitation learning with proximal policy optimization (PPO) and introduces a context-aware, dynamically parameterized state encoder grounded in the search tree structure. This enables a learnable, adaptive branching policy sensitive to tree topology and node semantics. Contribution/Results: By tightly coupling tree-structure awareness with policy gradient optimization, Tree-Gate PPO significantly improves cross-instance transferability. Experiments across multiple MILP benchmarks demonstrate substantial reductions in search node count and superior performance on the p-Primal-Dual integral metric compared to state-of-the-art learned branching methods—especially on OOD instances, where gains are most pronounced.

Technology Category

Application Category

📝 Abstract
Branch-and-Bound (B&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose Tree-Gate Proximal Policy Optimization (TGPPO), a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving context of the search tree. Empirical evaluations show that TGPPO often outperforms existing learning-based policies in terms of reducing the number of nodes explored and improving p-Primal-Dual Integrals (PDI), particularly in out-of-distribution instances. These results highlight the potential of RL to develop robust and adaptable branching strategies for MILP solvers.
Problem

Research questions and friction points this paper is trying to address.

Improving MILP branching policies using reinforcement learning
Addressing generalization issues in imitation learning approaches
Reducing node exploration and enhancing solution efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Proximal Policy Optimization for branching policies
Dynamic state representation captures search tree context
Reinforcement Learning improves generalization across instances
🔎 Similar Papers
No similar papers found.
A
Abdelouahed Ben Mhamed
Ai movement - International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnic, Rabat, Morocco
A
Assia Kamal-Idrissi
Ai movement - International Artificial Intelligence Center of Morocco, University Mohammed VI Polytechnic, Rabat, Morocco
Amal El Fallah Seghrouchni
Amal El Fallah Seghrouchni
Full professor
Artificial IntelligenceAutonomous agentsMulti-Agent SystemsAmbient intelligence