🤖 AI Summary
To address the challenges of limited local observations, high explicit communication overhead, and low global coordination efficiency in Multi-Agent Pickup and Delivery (MAPD) tasks within narrow-aisle environments (e.g., warehouses), this paper proposes a sequence-based implicit coordination path planning framework. It jointly encodes multi-agent trajectories into a permutation-invariant sequence and leverages Transformer architectures to enable implicit information exchange without explicit inter-agent communication. We theoretically establish that the resulting sequential policy possesses order-invariance optimality, reducing decision complexity from exponential to linear in the number of agents. Integrating distributed execution with imitation learning ensures both real-time responsiveness and strong generalization. Experiments demonstrate that our method significantly outperforms existing learning-based approaches across multiple MAPF benchmarks and their variants, while maintaining robust high performance in unseen complex environments—thereby enhancing both coordination efficiency and global situational awareness.
📝 Abstract
Multi-Agent Pickup and Delivery (MAPD) is a challenging extension of Multi-Agent Path Finding (MAPF), where agents are required to sequentially complete tasks with fixed-location pickup and delivery demands. Although learning-based methods have made progress in MAPD, they often perform poorly in warehouse-like environments with narrow pathways and long corridors when relying only on local observations for distributed decision-making. Communication learning can alleviate the lack of global information but introduce high computational complexity due to point-to-point communication. To address this challenge, we formulate MAPF as a sequence modeling problem and prove that path-finding policies under sequence modeling possess order-invariant optimality, ensuring its effectiveness in MAPD. Building on this, we propose the Sequential Pathfinder (SePar), which leverages the Transformer paradigm to achieve implicit information exchange, reducing decision-making complexity from exponential to linear while maintaining efficiency and global awareness. Experiments demonstrate that SePar consistently outperforms existing learning-based methods across various MAPF tasks and their variants, and generalizes well to unseen environments. Furthermore, we highlight the necessity of integrating imitation learning in complex maps like warehouses.