Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the curse of dimensionality, environmental dynamics, and poor cross-scenario generalization in multi-robot task planning for ultra-large-scale robotic mobile fulfillment systems (RMFS), this paper proposes a temporal-graph-topology-based hierarchical reinforcement learning planner. Our method features: (1) a novel Hierarchical Temporal Attention Network (HTAN) to model spatiotemporal dependencies; (2) a multi-stage curriculum policy learning framework enabling progressive capability acquisition; and (3) a counterfactual rollout–based credit assignment algorithm ensuring fair reward allocation. Evaluated on both simulation and real-world RMFS deployments, our approach consistently outperforms state-of-the-art methods. Notably, it maintains high efficiency and strong robustness even on unseen maps with 200 robots and 1,000 shelves—demonstrating significant improvements in scalability, generalization, and training stability.

Technology Category

Application Category

📝 Abstract
To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.
Problem

Research questions and friction points this paper is trying to address.

Multi-Robot Coordination
Dynamic Environments
Fair Reward Allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning
Task Allocation Optimization
Multi-stage Learning Curriculum
🔎 Similar Papers
No similar papers found.
X
Xuan Zhou
State Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China
X
Xiang Shi
Department of Automation, Tsinghua University, Beijing 10084, China
Lele Zhang
Lele Zhang
State Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China
C
Chen Chen
State Key Lab of Autonomous Intelligent Unmanned Systems, School of Automation, Beijing Institute of Technology, Beijing 100081, China
H
Hongbo Li
Beijing Geek+ Technology Co., Ltd, Beijing 100000, China
L
Linkang Ma
Zhejiang Cainiao Supply Chain Management Company Ltd., Hangzhou 311101, China
Fang Deng
Fang Deng
Beijing Institute of Technology
New EnergyIntelligent Information ProcessingIntelligent Wearable System
J
Jie Chen
State Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing 100081, China, and also with Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai 200092, China