🤖 AI Summary
To address the challenge of dynamically varying and hard-to-model human partner policies in human-agent collaboration, this paper proposes the Strategy-Conditioned Collaboration Framework (SCCF) for online adaptive coordination with unknown human partners. Methodologically, SCCF employs a variational autoencoder to learn a latent policy representation, applies unsupervised clustering for policy identification, and dynamically selects optimal responses via Fixed-Share Regret Minimization. Evaluated in a modified Overcooked environment, SCCF significantly outperforms existing baselines in policy generalizability, response latency, and collaborative efficiency. A user study further confirms its effectiveness and robustness in real-world human-agent collaboration. The core contribution is the first integration of latent policy learning, unsupervised clustering, and regret minimization into a lightweight, online-adaptive framework—enabling efficient coordination in heterogeneous teams performing high-dynamics, low-latency tasks.
📝 Abstract
Adaptation is the cornerstone of effective collaboration among heterogeneous team members. In human-agent teams, artificial agents need to adapt to their human partners in real time, as individuals often have unique preferences and policies that may change dynamically throughout interactions. This becomes particularly challenging in tasks with time pressure and complex strategic spaces, where identifying partner behaviors and selecting suitable responses is difficult. In this work, we introduce a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a broad range of potential partner strategies in real-time. Our approach encodes strategies with a variational autoencoder to learn a latent strategy space from agent trajectory data, identifies distinct strategy types through clustering, and trains a cooperator agent conditioned on these clusters by generating partners of each strategy type. For online adaptation to novel partners, we leverage a fixed-share regret minimization algorithm that dynamically infers and adjusts the partner's strategy estimation during interaction. We evaluate our method in a modified version of the Overcooked domain, a complex collaborative cooking environment that requires effective coordination among two players with a diverse potential strategy space. Through these experiments and an online user study, we demonstrate that our proposed agent achieves state of the art performance compared to existing baselines when paired with novel human, and agent teammates.