LNS2+RL: Combining Multi-agent Reinforcement Learning with Large Neighborhood Search in Multi-agent Path Finding

πŸ“… 2024-05-28
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenge of balancing solution quality, computational efficiency, and success rate in high-density multi-agent path finding (MAPF), this paper proposes LNS2+RLβ€”a two-stage collaborative framework. In the first stage, a curriculum-learning-enhanced multi-agent reinforcement learning (MARL) policy (MAPPO) drives cooperative re-planning, incorporating collision-aware state encoding and sparse reward design to enable joint historical-future reasoning. In the second stage, the system adaptively switches to priority-based planning for accelerated resolution. This work introduces the first staged coupling mechanism between MARL and large neighborhood search (LNS2), dynamically trading off solution optimality and computational overhead. Experiments demonstrate that LNS2+RL achieves over 50% task success rate on complex mapsβ€”where baseline methods (LaCAM, EECBS, SCRIMP) fail entirely (0%). In high-density scenarios, it reduces average collisions by 37% and accelerates planning by 5.2Γ— compared to pure MARL.

Technology Category

Application Category

πŸ“ Abstract
Multi-Agent Path Finding (MAPF) is a critical component of logistics and warehouse management, which focuses on planning collision-free paths for a team of robots in a known environment. Recent work introduced a novel MAPF approach, LNS2, which proposed to repair a quickly obtained set of infeasible paths via iterative replanning, by relying on a fast, yet lower-quality, prioritized planning (PP) algorithm. At the same time, there has been a recent push for Multi-Agent Reinforcement Learning (MARL) based MAPF algorithms, which exhibit improved cooperation over such PP algorithms, although inevitably remaining slower. In this paper, we introduce a new MAPF algorithm, LNS2+RL, which combines the distinct yet complementary characteristics of LNS2 and MARL to effectively balance their individual limitations and get the best from both worlds. During early iterations, LNS2+RL relies on MARL for low-level replanning, which we show eliminates collisions much more than a PP algorithm. There, our MARL-based planner allows agents to reason about past and future information to gradually learn cooperative decision-making through a finely designed curriculum learning. At later stages of planning, LNS2+RL adaptively switches to PP algorithm to quickly resolve the remaining collisions, naturally trading off solution quality (number of collisions in the solution) and computational efficiency. Our comprehensive experiments on high-agent-density tasks across various team sizes, world sizes, and map structures consistently demonstrate the superior performance of LNS2+RL compared to many MAPF algorithms, including LNS2, LaCAM, EECBS, and SCRIMP. In maps with complex structures, the advantages of LNS2+RL are particularly pronounced, with LNS2+RL achieving a success rate of over 50% in nearly half of the tested tasks, while that of LaCAM, EECBS and SCRIMP falls to 0%.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Path Finding
Efficiency
Collision Avoidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

LNS2+RL
Multi-Robot Path Planning (MAPF)
Complex Environment