🤖 AI Summary
This work addresses the slow convergence of Benders decomposition in two-stage stochastic programming, which often arises as the master problem becomes increasingly burdened by accumulated cutting planes. To overcome this limitation, the study introduces, for the first time, a reinforcement learning–based framework that integrates a neural network–driven adaptive cut selection strategy. The policy network is trained using the REINFORCE algorithm to enable data-driven identification and retention of effective Benders cuts while discarding redundant ones. Empirical results demonstrate that the proposed method substantially improves computational efficiency, outperforming both classical Benders decomposition and supervised learning baselines on an electric vehicle charging station location problem. Furthermore, the approach exhibits strong generalization capabilities and favorable computational scalability across problem instances.
📝 Abstract
Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.