🤖 AI Summary
Branching variable selection in mixed-integer linear programming (MILP) via branch-and-bound (B&B) suffers from poor generalization; existing learning-based approaches are limited either by reliance on high-quality expert demonstrations (imitation learning) or by sparse rewards and difficulties in modeling dynamic state evolution (reinforcement learning).
Method: We propose a novel deep reinforcement learning framework that jointly models the structural and temporal dynamics of the branching process. It introduces (1) “resurrection trajectories” to explicitly capture the evolutionary structure and sequential dependencies of the search tree, and (2) an importance-weighted reward redistribution mechanism to mitigate reward sparsity and enhance cross-problem generalization. The framework integrates graph neural networks with policy networks for end-to-end learning over dynamically evolving search trees.
Results: Experiments on multiple MILP benchmarks demonstrate substantial improvements over prior RL methods: average reductions of 4.0% in branch nodes and 2.2% in LP iterations on large-scale instances, confirming both effectiveness and robustness.
📝 Abstract
The Branch-and-bound (B&B) algorithm is the main solver for Mixed Integer Linear Programs (MILPs), where the selection of branching variable is essential to computational efficiency. However, traditional heuristics for branching often fail to generalize across heterogeneous problem instances, while existing learning-based methods such as imitation learning (IL) suffers from dependence on expert demonstration quality, and reinforcement learning (RL) struggles with limitations in sparse rewards and dynamic state representation challenges. To address these issues, we propose ReviBranch, a novel deep RL framework that constructs revived trajectories by reviving explicit historical correspondences between branching decisions and their corresponding graph states along search-tree paths. During training, ReviBranch enables agents to learn from complete structural evolution and temporal dependencies within the branching process. Additionally, we introduce an importance-weighted reward redistribution mechanism that transforms sparse terminal rewards into dense stepwise feedback, addressing the sparse reward challenge. Extensive experiments on different MILP benchmarks demonstrate that ReviBranch outperforms state-of-the-art RL methods, reducing B&B nodes by 4.0% and LP iterations by 2.2% on large-scale instances. The results highlight the robustness and generalizability of ReviBranch across heterogeneous MILP problem classes.