๐ค AI Summary
Existing continual learning (CL) benchmarks lack support for cooperative multi-agent continual learning (CMARL) and rely predominantly on CPU-based simulation, resulting in short task sequences and poor scalability. This paper introduces MEAL, the first benchmark specifically designed for CMARL. It formally defines the CMARL evaluation paradigm and proposes a JAX-based, GPU-accelerated simulation framework enabling hundred-task continual sequencesโfully trainable within hours on standard desktop hardware. Through integrated implementations of PPO, experience replay (ER), MASER, and systematic ablation studies, MEAL reveals the critical roles of coordination mechanisms and network architecture in multi-agent continual adaptation. Experiments demonstrate that mainstream CL+MARL approaches perform well in simple environments but fail significantly in complex, long-horizon cooperative scenarios requiring sustained inter-agent coordination.
๐ Abstract
Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms, with environment availability strongly impacting research. One particularly underexplored intersection is continual learning (CL) in cooperative multi-agent settings. To remedy this, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark tailored for continual multi-agent reinforcement learning (CMARL). Existing CL benchmarks run environments on the CPU, leading to computational bottlenecks and limiting the length of task sequences. MEAL leverages JAX for GPU acceleration, enabling continual learning across sequences of 100 tasks on a standard desktop PC in a few hours. We show that naively combining popular CL and MARL methods yields strong performance on simple environments, but fails to scale to more complex settings requiring sustained coordination and adaptation. Our ablation study identifies architectural and algorithmic features critical for CMARL on MEAL.