ReviBranch: Deep Reinforcement Learning for Branch-and-Bound with Revived Trajectories

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Branching variable selection in mixed-integer linear programming (MILP) via branch-and-bound (B&B) suffers from poor generalization; existing learning-based approaches are limited either by reliance on high-quality expert demonstrations (imitation learning) or by sparse rewards and difficulties in modeling dynamic state evolution (reinforcement learning). Method: We propose a novel deep reinforcement learning framework that jointly models the structural and temporal dynamics of the branching process. It introduces (1) “resurrection trajectories” to explicitly capture the evolutionary structure and sequential dependencies of the search tree, and (2) an importance-weighted reward redistribution mechanism to mitigate reward sparsity and enhance cross-problem generalization. The framework integrates graph neural networks with policy networks for end-to-end learning over dynamically evolving search trees. Results: Experiments on multiple MILP benchmarks demonstrate substantial improvements over prior RL methods: average reductions of 4.0% in branch nodes and 2.2% in LP iterations on large-scale instances, confirming both effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

The Branch-and-bound (B&B) algorithm is the main solver for Mixed Integer Linear Programs (MILPs), where the selection of branching variable is essential to computational efficiency. However, traditional heuristics for branching often fail to generalize across heterogeneous problem instances, while existing learning-based methods such as imitation learning (IL) suffers from dependence on expert demonstration quality, and reinforcement learning (RL) struggles with limitations in sparse rewards and dynamic state representation challenges. To address these issues, we propose ReviBranch, a novel deep RL framework that constructs revived trajectories by reviving explicit historical correspondences between branching decisions and their corresponding graph states along search-tree paths. During training, ReviBranch enables agents to learn from complete structural evolution and temporal dependencies within the branching process. Additionally, we introduce an importance-weighted reward redistribution mechanism that transforms sparse terminal rewards into dense stepwise feedback, addressing the sparse reward challenge. Extensive experiments on different MILP benchmarks demonstrate that ReviBranch outperforms state-of-the-art RL methods, reducing B&B nodes by 4.0% and LP iterations by 2.2% on large-scale instances. The results highlight the robustness and generalizability of ReviBranch across heterogeneous MILP problem classes.

Problem

Research questions and friction points this paper is trying to address.

Improving branching variable selection in branch-and-bound for MILPs

Addressing sparse rewards and dynamic state challenges in RL

Overcoming generalization limitations of traditional branching heuristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep RL with revived historical branching trajectories

Importance-weighted reward redistribution for dense feedback

Learning from structural evolution and temporal dependencies

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents