Transformer World Model for Sample Efficient Multi-Agent Reinforcement Learning

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low sample efficiency, non-stationarity, and difficulty in modeling other agents’ behaviors in partially observable multi-agent reinforcement learning (MARL), this paper proposes MA-WM—the first Transformer-based world model framework for MARL. Methodologically, it introduces: (1) a decentralized imagination mechanism enabling efficient policy planning within implicit individual world models; (2) a semi-centralized critic jointly optimized with prioritized experience replay; and (3) an explicit teammate behavior prediction module to mitigate non-stationarity. The framework natively supports both vector and image inputs. Evaluated on StarCraft II, PettingZoo, and Melting Pot benchmarks, MA-WM achieves state-of-the-art performance, converging to near-optimal policies with only 50K environment interactions—demonstrating significantly higher sample efficiency than existing methods.

Technology Category

Application Category

📝 Abstract
We present the Multi-Agent Transformer World Model (MATWM), a novel transformer-based world model designed for multi-agent reinforcement learning in both vector- and image-based environments. MATWM combines a decentralized imagination framework with a semi-centralized critic and a teammate prediction module, enabling agents to model and anticipate the behavior of others under partial observability. To address non-stationarity, we incorporate a prioritized replay mechanism that trains the world model on recent experiences, allowing it to adapt to agents' evolving policies. We evaluated MATWM on a broad suite of benchmarks, including the StarCraft Multi-Agent Challenge, PettingZoo, and MeltingPot. MATWM achieves state-of-the-art performance, outperforming both model-free and prior world model approaches, while demonstrating strong sample efficiency, achieving near-optimal performance in as few as 50K environment interactions. Ablation studies confirm the impact of each component, with substantial gains in coordination-heavy tasks.
Problem

Research questions and friction points this paper is trying to address.

Efficient multi-agent learning in partial observability
Addressing non-stationarity with prioritized replay
Improving coordination in transformer-based world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based world model for multi-agent learning
Decentralized imagination with semi-centralized critic
Prioritized replay for non-stationarity adaptation
A
Azad Deihim
Artificial Intelligence Research Centre (CitAI), City St George’s, University of London
E
Eduardo Alonso
Artificial Intelligence Research Centre (CitAI), City St George’s, University of London
Dimitra Apostolopoulou
Dimitra Apostolopoulou
Oxford Institute for Energy Studies
energy economicselectricity marketsenergy transition