🤖 AI Summary
In human–agent collaboration, agents must rapidly identify and adapt to unknown human partners’ strategies under time pressure and within complex strategy spaces. Method: We propose TALENTS, a framework for zero-shot online adaptive cooperation. It integrates a variational autoencoder—trained to learn latent strategy representations from interaction trajectories—with Follow-the-Regularized-Leader (FTRL)-based dynamic strategy inference, yielding a strategy-conditioned cooperative model. The model is jointly trained via clustering-based strategy categorization and reinforcement learning. Contribution/Results: Evaluated in a customized Overcooked environment with online human participants, TALENTS significantly outperforms existing baselines when collaborating with unfamiliar human partners. Results demonstrate strong cross-individual strategy generalization and real-time adaptability, validating its efficacy in open-ended, interactive settings.
📝 Abstract
In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In this work, we introduce TALENTS, a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a range of partner strategies, enabling ad-hoc teamwork. Our approach utilizes a variational autoencoder to learn a latent strategy space from trajectory data. This latent space represents the underlying strategies that agents employ. Subsequently, the system identifies different types of strategy by clustering the data. Finally, a cooperator agent is trained to generate partners for each type of strategy, conditioned on these clusters. In order to adapt to previously unseen partners, we leverage a fixed-share regret minimization algorithm that infers and adjusts the estimated partner strategy dynamically. We assess our approach in a customized version of the Overcooked environment, posing a challenging cooperative cooking task that demands strong coordination across a wide range of possible strategies. Using an online user study, we show that our agent outperforms current baselines when working with unfamiliar human partners.