Learning Branching Policies for MILPs with Proximal Policy Optimization

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing learning-based branching policies for mixed-integer linear programming (MILP) exhibit poor generalization—particularly on structurally diverse or out-of-distribution (OOD) instances—due to reliance on imitation learning and static state representations. Method: We propose Tree-Gate PPO, a reinforcement learning framework that replaces imitation learning with proximal policy optimization (PPO) and introduces a context-aware, dynamically parameterized state encoder grounded in the search tree structure. This enables a learnable, adaptive branching policy sensitive to tree topology and node semantics. Contribution/Results: By tightly coupling tree-structure awareness with policy gradient optimization, Tree-Gate PPO significantly improves cross-instance transferability. Experiments across multiple MILP benchmarks demonstrate substantial reductions in search node count and superior performance on the p-Primal-Dual integral metric compared to state-of-the-art learned branching methods—especially on OOD instances, where gains are most pronounced.

Technology Category

Application Category

📝 Abstract

Branch-and-Bound (B&B) is the dominant exact solution method for Mixed Integer Linear Programs (MILP), yet its exponential time complexity poses significant challenges for large-scale instances. The growing capabilities of machine learning have spurred efforts to improve B&B by learning data-driven branching policies. However, most existing approaches rely on Imitation Learning (IL), which tends to overfit to expert demonstrations and struggles to generalize to structurally diverse or unseen instances. In this work, we propose Tree-Gate Proximal Policy Optimization (TGPPO), a novel framework that employs Proximal Policy Optimization (PPO), a Reinforcement Learning (RL) algorithm, to train a branching policy aimed at improving generalization across heterogeneous MILP instances. Our approach builds on a parameterized state space representation that dynamically captures the evolving context of the search tree. Empirical evaluations show that TGPPO often outperforms existing learning-based policies in terms of reducing the number of nodes explored and improving p-Primal-Dual Integrals (PDI), particularly in out-of-distribution instances. These results highlight the potential of RL to develop robust and adaptable branching strategies for MILP solvers.

Problem

Research questions and friction points this paper is trying to address.

Improving MILP branching policies using reinforcement learning

Addressing generalization issues in imitation learning approaches

Reducing node exploration and enhancing solution efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Proximal Policy Optimization for branching policies

Dynamic state representation captures search tree context

Reinforcement Learning improves generalization across instances

🔎 Similar Papers

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning