ReviBranch: Deep Reinforcement Learning for Branch-and-Bound with Revived Trajectories

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Branching variable selection in mixed-integer linear programming (MILP) via branch-and-bound (B&B) suffers from poor generalization; existing learning-based approaches are limited either by reliance on high-quality expert demonstrations (imitation learning) or by sparse rewards and difficulties in modeling dynamic state evolution (reinforcement learning). Method: We propose a novel deep reinforcement learning framework that jointly models the structural and temporal dynamics of the branching process. It introduces (1) “resurrection trajectories” to explicitly capture the evolutionary structure and sequential dependencies of the search tree, and (2) an importance-weighted reward redistribution mechanism to mitigate reward sparsity and enhance cross-problem generalization. The framework integrates graph neural networks with policy networks for end-to-end learning over dynamically evolving search trees. Results: Experiments on multiple MILP benchmarks demonstrate substantial improvements over prior RL methods: average reductions of 4.0% in branch nodes and 2.2% in LP iterations on large-scale instances, confirming both effectiveness and robustness.

Technology Category

Application Category

📝 Abstract
The Branch-and-bound (B&B) algorithm is the main solver for Mixed Integer Linear Programs (MILPs), where the selection of branching variable is essential to computational efficiency. However, traditional heuristics for branching often fail to generalize across heterogeneous problem instances, while existing learning-based methods such as imitation learning (IL) suffers from dependence on expert demonstration quality, and reinforcement learning (RL) struggles with limitations in sparse rewards and dynamic state representation challenges. To address these issues, we propose ReviBranch, a novel deep RL framework that constructs revived trajectories by reviving explicit historical correspondences between branching decisions and their corresponding graph states along search-tree paths. During training, ReviBranch enables agents to learn from complete structural evolution and temporal dependencies within the branching process. Additionally, we introduce an importance-weighted reward redistribution mechanism that transforms sparse terminal rewards into dense stepwise feedback, addressing the sparse reward challenge. Extensive experiments on different MILP benchmarks demonstrate that ReviBranch outperforms state-of-the-art RL methods, reducing B&B nodes by 4.0% and LP iterations by 2.2% on large-scale instances. The results highlight the robustness and generalizability of ReviBranch across heterogeneous MILP problem classes.
Problem

Research questions and friction points this paper is trying to address.

Improving branching variable selection in branch-and-bound for MILPs
Addressing sparse rewards and dynamic state challenges in RL
Overcoming generalization limitations of traditional branching heuristics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep RL with revived historical branching trajectories
Importance-weighted reward redistribution for dense feedback
Learning from structural evolution and temporal dependencies
🔎 Similar Papers
No similar papers found.
D
Dou Jiabao
Hong Kong Baptist University
N
Nie Jiayi
Hong Kong Baptist University
Y
Yihang Cheng
Central China Normal University
Jinwei Liu
Jinwei Liu
Hong Kong Baptist University
Y
Yingrui Ji
University of Chinese Academy of Sciences
C
Canran Xiao
Central South University
F
Feixiang Du
Tongling University
Jiaping Xiao
Jiaping Xiao
Nanyang Technological University
Cyber-Physical SystemsIntelligent SystemsMultirobot LearningArtificial Intelligence