Efficient Generation of Diverse Cooperative Agents with World Models

πŸ“… 2025-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In zero-shot coordination (ZSC), two core bottlenecks impede progress: insufficient partner agent diversity and inefficient cross-game policy minimization (XPM) trainingβ€”XPM relies on costly multi-trajectory environment sampling and requires independent, from-scratch training for each partner. To address these, we propose XPM-WM, the first framework to integrate a world model (comprising a VAE and RSSM) into XPM. It replaces expensive real-environment trajectory sampling with model-generated synthetic trajectories, eliminating the need for multi-trajectory collection. Moreover, a shared, reusable dynamics model drives the evolution of diverse partner policies, obviating redundant per-partner training. Evaluated on the SP benchmark, XPM-WM matches state-of-the-art performance in ZSC success rate and population training reward while improving sample efficiency by 3.2Γ— and enabling efficient generation of partner agents at scale (up to hundreds).

Technology Category

Application Category

πŸ“ Abstract
A major bottleneck in the training process for Zero-Shot Coordination (ZSC) agents is the generation of partner agents that are diverse in collaborative conventions. Current Cross-play Minimization (XPM) methods for population generation can be very computationally expensive and sample inefficient as the training objective requires sampling multiple types of trajectories. Each partner agent in the population is also trained from scratch, despite all of the partners in the population learning policies of the same coordination task. In this work, we propose that simulated trajectories from the dynamics model of an environment can drastically speed up the training process for XPM methods. We introduce XPM-WM, a framework for generating simulated trajectories for XPM via a learned World Model (WM). We show XPM with simulated trajectories removes the need to sample multiple trajectories. In addition, we show our proposed method can effectively generate partners with diverse conventions that match the performance of previous methods in terms of SP population training reward as well as training partners for ZSC agents. Our method is thus, significantly more sample efficient and scalable to a larger number of partners.
Problem

Research questions and friction points this paper is trying to address.

Generate diverse partner agents for Zero-Shot Coordination efficiently
Reduce computational cost of Cross-play Minimization methods
Improve sample efficiency in training cooperative agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses simulated trajectories from world models
Eliminates need for multiple trajectory sampling
Enhances diversity and efficiency in training
πŸ”Ž Similar Papers
No similar papers found.
Y
Yi Loo
Singapore University of Technology and Design (SUTD)
A
Akshunn Trivedi
Singapore University of Technology and Design (SUTD)
Malika Meghjani
Malika Meghjani
Assistant Professor, Singapore University of Technology and Design
Multi-Robot SystemsMachine LearningComputer VisionMarine RoboticsSelf-Driving Vehicles