🤖 AI Summary
This paper addresses the challenge of insufficient policy robustness in Ad Hoc Teamwork (AHT) arising from unknown teammate distributions. To tackle this, we propose a minimax Bayesian framework: prior to deployment, it jointly optimizes both the agent’s policy and the Bayesian inference process against the worst-case teammate prior—i.e., an adversarial distribution—thereby eliminating reliance on fixed teammate models. Our key contribution is the first integration of minimax optimization with Bayesian policy inference, yielding provably strong robustness guarantees under teammate uncertainty. Experiments on the Melting Pot cooking benchmark demonstrate that our method significantly outperforms self-play, fictitious play, and best-response learning, achieving superior worst-case performance and enhanced generalization across diverse unseen teammates.
📝 Abstract
We propose a minimax-Bayes approach to Ad Hoc Teamwork (AHT) that optimizes policies against an adversarial prior over partners, explicitly accounting for uncertainty about partners at time of deployment. Unlike existing methods that assume a specific distribution over partners, our approach improves worst-case performance guarantees. Extensive experiments, including evaluations on coordinated cooking tasks from the Melting Pot suite, show our method's superior robustness compared to self-play, fictitious play, and best response learning. Our work highlights the importance of selecting an appropriate training distribution over teammates to achieve robustness in AHT.