Learning to Solve the Min-Max Mixed-Shelves Picker-Routing Problem via Hierarchical and Parallel Decoding

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This paper addresses the min-max variant of the Mixed-Shelf Picking Routing Problem (MSPRP), which minimizes the makespan—the longest completion time among multiple pickers—to enhance collaborative efficiency and system scalability. To overcome the poor scalability of traditional optimization methods and the high latency and suboptimal coordination inherent in existing end-to-end learning approaches—particularly those relying on sequential decision-making—we propose a hierarchical parallel multi-agent reinforcement learning framework. It employs graph neural networks to encode environment states, a hierarchical policy network to jointly model action distributions, and introduces two key innovations: (i) a parallel decoding mechanism and (ii) a decoupling strategy that separates action-space selection from sequence generation—ensuring conflict-free concurrent execution. Experiments demonstrate that our method achieves both state-of-the-art solution quality and the fastest inference speed on large-scale and out-of-distribution instances, significantly outperforming conventional heuristics and current learning-based baselines.

Technology Category

Application Category

📝 Abstract

The Mixed-Shelves Picker Routing Problem (MSPRP) is a fundamental challenge in warehouse logistics, where pickers must navigate a mixed-shelves environment to retrieve SKUs efficiently. Traditional heuristics and optimization-based approaches struggle with scalability, while recent machine learning methods often rely on sequential decision-making, leading to high solution latency and suboptimal agent coordination. In this work, we propose a novel hierarchical and parallel decoding approach for solving the min-max variant of the MSPRP via multi-agent reinforcement learning. While our approach generates a joint distribution over agent actions, allowing for fast decoding and effective picker coordination, our method introduces a sequential action selection to avoid conflicts in the multi-dimensional action space. Experiments show state-of-the-art performance in both solution quality and inference speed, particularly for large-scale and out-of-distribution instances. Our code is publicly available at http://github.com/LTluttmann/marl4msprp.

Problem

Research questions and friction points this paper is trying to address.

Efficient warehouse picker routing

Scalable multi-agent coordination

Min-max MSPRP solution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical parallel decoding

Multi-agent reinforcement learning

Sequential action selection

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations