Scalable Hierarchical Reinforcement Learning for Hyper Scale Multi-Robot Task Planning

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the curse of dimensionality, environmental dynamics, and poor cross-scenario generalization in multi-robot task planning for ultra-large-scale robotic mobile fulfillment systems (RMFS), this paper proposes a temporal-graph-topology-based hierarchical reinforcement learning planner. Our method features: (1) a novel Hierarchical Temporal Attention Network (HTAN) to model spatiotemporal dependencies; (2) a multi-stage curriculum policy learning framework enabling progressive capability acquisition; and (3) a counterfactual rollout–based credit assignment algorithm ensuring fair reward allocation. Evaluated on both simulation and real-world RMFS deployments, our approach consistently outperforms state-of-the-art methods. Notably, it maintains high efficiency and strong robustness even on unseen maps with 200 robots and 1,000 shelves—demonstrating significant improvements in scalability, generalization, and training stability.

Technology Category

Application Category

📝 Abstract

To improve the efficiency of warehousing system and meet huge customer orders, we aim to solve the challenges of dimension disaster and dynamic properties in hyper scale multi-robot task planning (MRTP) for robotic mobile fulfillment system (RMFS). Existing research indicates that hierarchical reinforcement learning (HRL) is an effective method to reduce these challenges. Based on that, we construct an efficient multi-stage HRL-based multi-robot task planner for hyper scale MRTP in RMFS, and the planning process is represented with a special temporal graph topology. To ensure optimality, the planner is designed with a centralized architecture, but it also brings the challenges of scaling up and generalization that require policies to maintain performance for various unlearned scales and maps. To tackle these difficulties, we first construct a hierarchical temporal attention network (HTAN) to ensure basic ability of handling inputs with unfixed lengths, and then design multi-stage curricula for hierarchical policy learning to further improve the scaling up and generalization ability while avoiding catastrophic forgetting. Additionally, we notice that policies with hierarchical structure suffer from unfair credit assignment that is similar to that in multi-agent reinforcement learning, inspired of which, we propose a hierarchical reinforcement learning algorithm with counterfactual rollout baseline to improve learning performance. Experimental results demonstrate that our planner outperform other state-of-the-art methods on various MRTP instances in both simulated and real-world RMFS. Also, our planner can successfully scale up to hyper scale MRTP instances in RMFS with up to 200 robots and 1000 retrieval racks on unlearned maps while keeping superior performance over other methods.

Problem

Research questions and friction points this paper is trying to address.

Multi-Robot Coordination

Dynamic Environments

Fair Reward Allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning

Task Allocation Optimization

Multi-stage Learning Curriculum

🔎 Similar Papers

No similar papers found.

Authors to Follow