🤖 AI Summary
Multi-agent pathfinding (MAPF) in partially observable dynamic environments suffers from combinatorial explosion, coordination difficulties, and sparse rewards; existing decentralized reinforcement learning (RL) approaches often incur collisions and rely on complex inter-agent communication, requiring weeks of training. This paper introduces DT-LLM—the first decentralized framework integrating offline RL, Decision Transformers (DTs), and a large language model (GPT-4o). Its core innovations are: (1) leveraging DTs for long-horizon credit assignment without online interaction, and (2) employing GPT-4o to dynamically generate policy-guidance signals—replacing explicit communication and real-time coordination. Experiments demonstrate that DT-LLM reduces training time from weeks to hours, significantly lowers collision rates (average reduction of 42%) and improves task success rates (average increase of 31%) across both static and dynamic scenarios, while exhibiting strong generalization to unseen environments.
📝 Abstract
Multi-Agent Path Finding (MAPF) poses a significant and challenging problem critical for applications in robotics and logistics, particularly due to its combinatorial complexity and the partial observability inherent in realistic environments. Decentralized reinforcement learning methods commonly encounter two substantial difficulties: first, they often yield self-centered behaviors among agents, resulting in frequent collisions, and second, their reliance on complex communication modules leads to prolonged training times, sometimes spanning weeks. To address these challenges, we propose an efficient decentralized planning framework based on the Decision Transformer (DT), uniquely leveraging offline reinforcement learning to substantially reduce training durations from weeks to mere hours. Crucially, our approach effectively handles long-horizon credit assignment and significantly improves performance in scenarios with sparse and delayed rewards. Furthermore, to overcome adaptability limitations inherent in standard RL methods under dynamic environmental changes, we integrate a large language model (GPT-4o) to dynamically guide agent policies. Extensive experiments in both static and dynamically changing environments demonstrate that our DT-based approach, augmented briefly by GPT-4o, significantly enhances adaptability and performance.