Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learning-based multi-agent pathfinding (MAPF) solvers suffer from poor scalability—failing to handle over 100,000 agents—and low coordination efficiency. To address this, we propose MAPF-GPT-DDG: a decentralized MAPF framework leveraging pretrained large language models. Its core innovations include an incremental expert data generation mechanism and an active fine-tuning strategy, synergistically combining imitation learning with centralized expert trajectory distillation. Crucially, it preserves fully decentralized inference while substantially improving generalization and computational efficiency. Experiments demonstrate that MAPF-GPT-DDG achieves, for the first time, real-time path planning for one million agents within a single environment. It outperforms all prior learning-based MAPF solvers in both solution quality and runtime, setting a new scalability benchmark. This work delivers a deployable, large-scale solution for practical multi-robot systems—including logistics automation and emergency response—where massive agent coordination is essential.

Technology Category

Application Category

📝 Abstract
Multi-agent pathfinding (MAPF) is a common abstraction of multi-robot trajectory planning problems, where multiple homogeneous robots simultaneously move in the shared environment. While solving MAPF optimally has been proven to be NP-hard, scalable, and efficient, solvers are vital for real-world applications like logistics, search-and-rescue, etc. To this end, decentralized suboptimal MAPF solvers that leverage machine learning have come on stage. Building on the success of the recently introduced MAPF-GPT, a pure imitation learning solver, we introduce MAPF-GPT-DDG. This novel approach effectively fine-tunes the pre-trained MAPF model using centralized expert data. Leveraging a novel delta-data generation mechanism, MAPF-GPT-DDG accelerates training while significantly improving performance at test time. Our experiments demonstrate that MAPF-GPT-DDG surpasses all existing learning-based MAPF solvers, including the original MAPF-GPT, regarding solution quality across many testing scenarios. Remarkably, it can work with MAPF instances involving up to 1 million agents in a single environment, setting a new milestone for scalability in MAPF domains.
Problem

Research questions and friction points this paper is trying to address.

Enhancing multi-agent pathfinding solvers via active fine-tuning
Improving scalability in large-scale multi-robot trajectory planning
Boosting performance of learning-based MAPF solvers with delta-data generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active fine-tuning with centralized expert data
Delta-data generation for accelerated training
Scalable to 1 million agents in single environment