Multi-Agent Path Finding via Offline RL and LLM Collaboration

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Multi-agent pathfinding (MAPF) in partially observable dynamic environments suffers from combinatorial explosion, coordination difficulties, and sparse rewards; existing decentralized reinforcement learning (RL) approaches often incur collisions and rely on complex inter-agent communication, requiring weeks of training. This paper introduces DT-LLM—the first decentralized framework integrating offline RL, Decision Transformers (DTs), and a large language model (GPT-4o). Its core innovations are: (1) leveraging DTs for long-horizon credit assignment without online interaction, and (2) employing GPT-4o to dynamically generate policy-guidance signals—replacing explicit communication and real-time coordination. Experiments demonstrate that DT-LLM reduces training time from weeks to hours, significantly lowers collision rates (average reduction of 42%) and improves task success rates (average increase of 31%) across both static and dynamic scenarios, while exhibiting strong generalization to unseen environments.

Technology Category

Application Category

📝 Abstract

Multi-Agent Path Finding (MAPF) poses a significant and challenging problem critical for applications in robotics and logistics, particularly due to its combinatorial complexity and the partial observability inherent in realistic environments. Decentralized reinforcement learning methods commonly encounter two substantial difficulties: first, they often yield self-centered behaviors among agents, resulting in frequent collisions, and second, their reliance on complex communication modules leads to prolonged training times, sometimes spanning weeks. To address these challenges, we propose an efficient decentralized planning framework based on the Decision Transformer (DT), uniquely leveraging offline reinforcement learning to substantially reduce training durations from weeks to mere hours. Crucially, our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards. Furthermore, to overcome adaptability limitations inherent in standard RL methods under dynamic environmental changes, we integrate a large language model (GPT-4o) to dynamically guide agent policies. Extensive experiments in both static and dynamically changing environments demonstrate that our DT-based approach, augmented briefly by GPT-4o, significantly enhances adaptability and performance.

Problem

Research questions and friction points this paper is trying to address.

Reduces decentralized MAPF training time from weeks to hours

Addresses self-centered agent behaviors causing frequent collisions

Enhances adaptability in dynamic environments using LLM guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline RL reduces training from weeks to hours

Decision Transformer handles sparse delayed rewards

LLM integration enhances adaptability in dynamic environments

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study