Classical and Deep Reinforcement Learning Inventory Control Policies for Pharmaceutical Supply Chains with Perishability and Non-Stationarity

📅 2025-01-18
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study addresses practical challenges in pharmaceutical supply chains—including short shelf life, uncertain production yields, and non-stationary demand—through an empirical collaboration with Bristol-Myers Squibb. Methodologically, it innovatively integrates parameter optimization of the Projected Inventory Level (PIL) policy with a demand-forecasting–driven deep reinforcement learning strategy (Proximal Policy Optimization, PPO), and proposes a classical policy tuning approach with boundary guarantees. The key contribution is the first empirical demonstration that multi-policy coordination outperforms single-paradigm approaches. Comparative evaluation against Order-Up-To (OUT), PIL, and DRL-PPO policies shows all three significantly outperform current manual benchmarks: PIL exhibits superior robustness; PPO achieves lowest cost under high demand variability but incurs substantial computational overhead. Results indicate no universally optimal policy for multi-objective trade-offs; instead, dynamic policy selection and combination—tailored to specific operational characteristics—are essential.

Technology Category

Application Category

📝 Abstract
We study inventory control policies for pharmaceutical supply chains, addressing challenges such as perishability, yield uncertainty, and non-stationary demand, combined with batching constraints, lead times, and lost sales. Collaborating with Bristol-Myers Squibb (BMS), we develop a realistic case study incorporating these factors and benchmark three policies--order-up-to (OUT), projected inventory level (PIL), and deep reinforcement learning (DRL) using the proximal policy optimization (PPO) algorithm--against a BMS baseline based on human expertise. We derive and validate bounds-based procedures for optimizing OUT and PIL policy parameters and propose a methodology for estimating projected inventory levels, which are also integrated into the DRL policy with demand forecasts to improve decision-making under non-stationarity. Compared to a human-driven policy, which avoids lost sales through higher holding costs, all three implemented policies achieve lower average costs but exhibit greater cost variability. While PIL demonstrates robust and consistent performance, OUT struggles under high lost sales costs, and PPO excels in complex and variable scenarios but requires significant computational effort. The findings suggest that while DRL shows potential, it does not outperform classical policies in all numerical experiments, highlighting 1) the need to integrate diverse policies to manage pharmaceutical challenges effectively, based on the current state-of-the-art, and 2) that practical problems in this domain seem to lack a single policy class that yields universally acceptable performance.
Problem

Research questions and friction points this paper is trying to address.

Pharmaceutical Supply Chain
Inventory Management
Cost Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning
Inventory Management
Pharmaceutical Supply Chain
🔎 Similar Papers
No similar papers found.
Francesco Stranieri
Francesco Stranieri
UniversitĂ  degli Studi di Milano-Bicocca
Deep LearningReinforcement LearningInventory ManagementArtificial Intelligence
C
Chaaben Kouki
ESSCA School of Management, Angers, 49000, France
W
W. Jaarsveld
Eindhoven University of Technology, Eindhoven, 5612, Netherlands
F
Fabio Stella
Polytechnic of Turin, Turin, 10129, Italy