Reinforcement Learning for Multi-Objective Multi-Echelon Supply Chain Optimisation

📅 2025-07-26

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the challenge of multi-objective, multi-tier supply chain coordination under non-stationary market conditions, this paper proposes a Markov decision process–based multi-objective reinforcement learning (MORL) framework. Methodologically, it introduces a shared experience replay buffer to enable cross-objective policy knowledge transfer and incorporates a Pareto front approximation mechanism to enhance both the quality and density of the solution set. Evaluated across multiple scenarios in a custom simulation environment, the framework outperforms weighted single-objective RL and multi-objective evolutionary algorithms (MOEAs): it achieves a 75% improvement in hypervolume metric and yields solution-set density eleven times higher than that of single-objective RL. Furthermore, it significantly reduces demand loss and improves inventory stability. The proposed approach establishes a scalable, intelligent decision-making paradigm for triple-bottom-line optimization—integrating economic, environmental, and social objectives—in sustainable supply chains.

Technology Category

Application Category

📝 Abstract

This study develops a generalised multi-objective, multi-echelon supply chain optimisation model with non-stationary markets based on a Markov decision process, incorporating economic, environmental, and social considerations. The model is evaluated using a multi-objective reinforcement learning (RL) method, benchmarked against an originally single-objective RL algorithm modified with weighted sum using predefined weights, and a multi-objective evolutionary algorithm (MOEA)-based approach. We conduct experiments on varying network complexities, mimicking typical real-world challenges using a customisable simulator. The model determines production and delivery quantities across supply chain routes to achieve near-optimal trade-offs between competing objectives, approximating Pareto front sets. The results demonstrate that the primary approach provides the most balanced trade-off between optimality, diversity, and density, further enhanced with a shared experience buffer that allows knowledge transfer among policies. In complex settings, it achieves up to 75% higher hypervolume than the MOEA-based method and generates solutions that are approximately eleven times denser, signifying better robustness, than those produced by the modified single-objective RL method. Moreover, it ensures stable production and inventory levels while minimising demand loss.

Problem

Research questions and friction points this paper is trying to address.

Optimize multi-echelon supply chains with non-stationary markets

Balance economic, environmental, and social objectives using RL

Achieve robust trade-offs in production and delivery decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective RL for supply chain optimisation

Shared experience buffer enhances policy transfer

Custom simulator mimics real-world network complexities

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations