MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning

๐Ÿ“… 2025-06-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing continual learning (CL) benchmarks lack support for cooperative multi-agent continual learning (CMARL) and rely predominantly on CPU-based simulation, resulting in short task sequences and poor scalability. This paper introduces MEAL, the first benchmark specifically designed for CMARL. It formally defines the CMARL evaluation paradigm and proposes a JAX-based, GPU-accelerated simulation framework enabling hundred-task continual sequencesโ€”fully trainable within hours on standard desktop hardware. Through integrated implementations of PPO, experience replay (ER), MASER, and systematic ablation studies, MEAL reveals the critical roles of coordination mechanisms and network architecture in multi-agent continual adaptation. Experiments demonstrate that mainstream CL+MARL approaches perform well in simple environments but fail significantly in complex, long-horizon cooperative scenarios requiring sustained inter-agent coordination.

Technology Category

Application Category

๐Ÿ“ Abstract
Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms, with environment availability strongly impacting research. One particularly underexplored intersection is continual learning (CL) in cooperative multi-agent settings. To remedy this, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark tailored for continual multi-agent reinforcement learning (CMARL). Existing CL benchmarks run environments on the CPU, leading to computational bottlenecks and limiting the length of task sequences. MEAL leverages JAX for GPU acceleration, enabling continual learning across sequences of 100 tasks on a standard desktop PC in a few hours. We show that naively combining popular CL and MARL methods yields strong performance on simple environments, but fails to scale to more complex settings requiring sustained coordination and adaptation. Our ablation study identifies architectural and algorithmic features critical for CMARL on MEAL.
Problem

Research questions and friction points this paper is trying to address.

Lack of benchmarks for continual multi-agent reinforcement learning
CPU-based CL benchmarks cause computational bottlenecks
Existing methods fail in complex coordination and adaptation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

First benchmark for continual multi-agent RL
Uses JAX for GPU acceleration
Enables 100-task sequences on desktop
๐Ÿ”Ž Similar Papers
No similar papers found.