🤖 AI Summary
The multi-agent pathfinding (MAPF) community has long lacked a unified, open benchmark supporting integrated learning and evaluation, hindering fair, cross-paradigm comparison among classical algorithms, multi-agent reinforcement learning (MARL) approaches, and hybrid methods.
Method: We introduce the first open-source MAPF benchmark platform, featuring configurable training environments, a stochastic instance generator, a standardized test suite, real-time visualization tools, and an automated evaluation system. Crucially, we propose a unified evaluation protocol grounded in multi-dimensional metrics—including success rate, path length, and computational overhead—to enable rigorous, paradigm-agnostic comparison.
Contribution/Results: The platform supports reproducible evaluation of over ten state-of-the-art methods, significantly enhancing rigor, consistency, and scalability in algorithmic comparison. It establishes a standardized infrastructure for MAPF research, fostering transparency, reproducibility, and systematic progress.
📝 Abstract
Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of predefined problem instances, a visualization toolkit, and a benchmarking tool for automated evaluation. We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators (such as success rate and path length), enabling a fair multi-fold comparison. The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.