OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Overcooked benchmarks inadequately evaluate zero-shot coordination (ZSC) due to insufficient asymmetric information, limited environmental stochasticity, and absence of protocol negotiation mechanisms—leading to distorted generalization assessment. To address this, we propose OvercookedV2: the first ZSC benchmark supporting runtime protocol formation and online adaptation, achieved via asynchronous observation modeling, state-space augmentation, dynamically generated stochastic tasks, and a decoupled multi-agent training–evaluation framework. Empirical analysis reveals that ZSC failure stems primarily from insufficient state coverage in self-play—not inherent coordination difficulty—and demonstrates that mere state diversity does not guarantee robust ZSC performance. OvercookedV2 provides a reproducible, highly discriminative evaluation standard, significantly advancing both AI–AI and AI–human zero-shot collaboration algorithms.

Technology Category

Application Category

📝 Abstract
AI agents hold the potential to transform everyday life by helping humans achieve their goals. To do this successfully, agents need to be able to coordinate with novel partners without prior interaction, a setting known as zero-shot coordination (ZSC). Overcooked has become one of the most popular benchmarks for evaluating coordination capabilities of AI agents and learning algorithms. In this work, we investigate the origins of ZSC challenges in Overcooked. We introduce a state augmentation mechanism which mixes states that might be encountered when paired with unknown partners into the training distribution, reducing the out-of-distribution challenge associated with ZSC. We show that independently trained agents under this algorithm coordinate successfully in Overcooked. Our results suggest that ZSC failure can largely be attributed to poor state coverage under self-play rather than more sophisticated coordination challenges. The Overcooked environment is therefore not suitable as a ZSC benchmark. To address these shortcomings, we introduce OvercookedV2, a new version of the benchmark, which includes asymmetric information and stochasticity, facilitating the creation of interesting ZSC scenarios. To validate OvercookedV2, we conduct experiments demonstrating that mere exhaustive state coverage is insufficient to coordinate well. Finally, we use OvercookedV2 to build a new range of coordination challenges, including ones that require test time protocol formation, and we demonstrate the need for new coordination algorithms that can adapt online. We hope that OvercookedV2 will help benchmark the next generation of ZSC algorithms and advance collaboration between AI agents and humans.
Problem

Research questions and friction points this paper is trying to address.

Investigates zero-shot coordination challenges in Overcooked environment
Proposes OvercookedV2 with asymmetric info for better ZSC benchmarking
Demonstrates need for new adaptive coordination algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

State augmentation for zero-shot coordination
OvercookedV2 introduces asymmetric information
New benchmark for ZSC algorithm testing
🔎 Similar Papers
No similar papers found.
T
Tobias Gessler
FLAIR, University of Oxford
T
Tin Dizdarevic
FLAIR, University of Oxford
A
Ani Calinescu
FLAIR, University of Oxford
Benjamin Ellis
Benjamin Ellis
FLAIR, University of Oxford
Andrei Lupu
Andrei Lupu
University of Oxford & FAIR, Meta AI
Reinforcement LearningMulti-Agent RL
J
Jakob Nicolaus Foerster
FLAIR, University of Oxford