Unsupervised Partner Design Enables Robust Ad-hoc Teamwork

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Addressing the challenge of partner-agnostic collaboration in ad-hoc multi-agent settings—where partners are unknown and strong generalization is required—this paper proposes the first adaptive partner generation framework that requires neither pre-trained partners nor manual hyperparameter tuning. Methodologically, it introduces the first unsupervised curriculum learning approach for cooperative multi-agent training: a variance-based learnability metric dynamically identifies the current learning bottleneck and selects partner difficulty accordingly; combined with stochastic policy mixing and biased random behavior, it generates diverse, controllably difficult training partners. Evaluated on Overcooked-AI and generalization benchmarks, our method significantly outperforms state-of-the-art baselines. A user study further confirms improvements in task reward, partner adaptability, and collaboration naturalness. Key contributions are: (1) the first fully unsupervised multi-agent curriculum learning mechanism; and (2) a learnable, interpretable paradigm for partner difficulty control.

Technology Category

Application Category

📝 Abstract

We introduce Unsupervised Partner Design (UPD) - a population-free, multi-agent reinforcement learning framework for robust ad-hoc teamwork that adaptively generates training partners without requiring pretrained partners or manual parameter tuning. UPD constructs diverse partners by stochastically mixing an ego agent's policy with biased random behaviours and scores them using a variance-based learnability metric that prioritises partners near the ego agent's current learning frontier. We show that UPD can be integrated with unsupervised environment design, resulting in the first method enabling fully unsupervised curricula over both level and partner distributions in a cooperative setting. Through extensive evaluations on Overcooked-AI and the Overcooked Generalisation Challenge, we demonstrate that this dynamic partner curriculum is highly effective: UPD consistently outperforms both population-based and population-free baselines as well as ablations. In a user study, we further show that UPD achieves higher returns than all baselines and was perceived as significantly more adaptive, more human-like, a better collaborator, and less frustrating.

Problem

Research questions and friction points this paper is trying to address.

Develops unsupervised partner design for robust teamwork

Adaptively generates diverse training partners without pretraining

Enables fully unsupervised curricula for cooperative settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Population-free multi-agent reinforcement learning framework

Stochastic policy mixing for diverse partners

Variance-based learnability metric for partner scoring

🔎 Similar Papers

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation