Learning to Cut: Reinforcement Learning for Benders Decomposition

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the slow convergence of Benders decomposition in two-stage stochastic programming, which often arises as the master problem becomes increasingly burdened by accumulated cutting planes. To overcome this limitation, the study introduces, for the first time, a reinforcement learning–based framework that integrates a neural network–driven adaptive cut selection strategy. The policy network is trained using the REINFORCE algorithm to enable data-driven identification and retention of effective Benders cuts while discarding redundant ones. Empirical results demonstrate that the proposed method substantially improves computational efficiency, outperforming both classical Benders decomposition and supervised learning baselines on an electric vehicle charging station location problem. Furthermore, the approach exhibits strong generalization capabilities and favorable computational scalability across problem instances.

📝 Abstract

Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.

Problem

Research questions and friction points this paper is trying to address.

Benders decomposition

slow convergence

two-stage stochastic programming

cut selection

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Benders Decomposition

Cut Selection