MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of adapting multi-objective, multi-echelon combinatorial supply chains in dynamic environments, where task-specific retraining and high computational costs hinder responsiveness. To overcome this, the authors propose a hierarchical meta multi-objective reinforcement learning framework that integrates structured subproblem decomposition, meta-reinforcement learning, and a diversity-driven Pareto front adaptation mechanism. This approach enables few-shot generalization across tasks and domain-agnostic dynamic decision-making. Experimental results demonstrate that, on tasks ranging from simple to moderately complex, the method achieves up to a 10% improvement in hypervolume and a 5% gain in expected utility compared to baseline approaches.

Technology Category

Application Category

📝 Abstract

Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the supply chain domain, MIRACL is theoretically domain-agnostic and applicable to broader dynamic multi-objective decision-making problems. Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility. These results underscore the potential of MIRACL for robust, efficient adaptation in multi-objective problems.

Problem

Research questions and friction points this paper is trying to address.

multi-objective reinforcement learning

multi-echelon supply chain

combinatorial optimisation

dynamic environments

task-specific retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-Reinforcement Learning

Multi-Objective Optimization

Combinatorial Supply Chain