PROPEL: Supervised and Reinforcement Learning for Large-Scale Supply Chain Planning

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Large-scale supply chain planning (SCP) involves mixed-integer programming (MIP) formulations with non-binary integer and continuous variables, flow-balance constraints, and capacity limits—posing significant computational challenges for exact solvers. Method: This paper proposes a synergistic optimization framework integrating supervised learning and deep reinforcement learning (DRL). It introduces a novel structured zero-value variable identification mechanism to overcome the limitation of existing ML-for-optimization approaches—restricted to binary MIPs—and incorporates a DRL-driven dynamic relaxation strategy to precisely prune the search space. The method jointly performs variable prediction, relaxation decision-making, and MIP solving. Results: Evaluated on industrial-scale instances with over one million variables, the framework reduces primal integral by 60% and primal gap by 88%, achieving up to 15.92× improvement in both solution quality and computational efficiency compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

This paper considers how to fuse Machine Learning (ML) and optimization to solve large-scale Supply Chain Planning (SCP) optimization problems. These problems can be formulated as MIP models which feature both integer (non-binary) and continuous variables, as well as flow balance and capacity constraints. This raises fundamental challenges for existing integrations of ML and optimization that have focused on binary MIPs and graph problems. To address these, the paper proposes PROPEL, a new framework that combines optimization with both supervised and Deep Reinforcement Learning (DRL) to reduce the size of search space significantly. PROPEL uses supervised learning, not to predict the values of all integer variables, but to identify the variables that are fixed to zero in the optimal solution, leveraging the structure of SCP applications. PROPEL includes a DRL component that selects which fixed-at-zero variables must be relaxed to improve solution quality when the supervised learning step does not produce a solution with the desired optimality tolerance. PROPEL has been applied to industrial supply chain planning optimizations with millions of variables. The computational results show dramatic improvements in solution times and quality, including a 60% reduction in primal integral and an 88% primal gap reduction, and improvement factors of up to 13.57 and 15.92, respectively.

Problem

Research questions and friction points this paper is trying to address.

Fuse ML and optimization for large-scale Supply Chain Planning.

Address challenges in MIP models with integer and continuous variables.

Reduce search space using supervised and Deep Reinforcement Learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines supervised and reinforcement learning

Reduces search space via variable identification

Improves solution quality with DRL relaxation

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations