Modeling Latent Partner Strategies for Adaptive Zero-Shot Human-Agent Collaboration

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In human–agent collaboration, agents must rapidly identify and adapt to unknown human partners’ strategies under time pressure and within complex strategy spaces. Method: We propose TALENTS, a framework for zero-shot online adaptive cooperation. It integrates a variational autoencoder—trained to learn latent strategy representations from interaction trajectories—with Follow-the-Regularized-Leader (FTRL)-based dynamic strategy inference, yielding a strategy-conditioned cooperative model. The model is jointly trained via clustering-based strategy categorization and reinforcement learning. Contribution/Results: Evaluated in a customized Overcooked environment with online human participants, TALENTS significantly outperforms existing baselines when collaborating with unfamiliar human partners. Results demonstrate strong cross-individual strategy generalization and real-time adaptability, validating its efficacy in open-ended, interactive settings.

Technology Category

Application Category

📝 Abstract
In collaborative tasks, being able to adapt to your teammates is a necessary requirement for success. When teammates are heterogeneous, such as in human-agent teams, agents need to be able to observe, recognize, and adapt to their human partners in real time. This becomes particularly challenging in tasks with time pressure and complex strategic spaces where the dynamics can change rapidly. In this work, we introduce TALENTS, a strategy-conditioned cooperator framework that learns to represent, categorize, and adapt to a range of partner strategies, enabling ad-hoc teamwork. Our approach utilizes a variational autoencoder to learn a latent strategy space from trajectory data. This latent space represents the underlying strategies that agents employ. Subsequently, the system identifies different types of strategy by clustering the data. Finally, a cooperator agent is trained to generate partners for each type of strategy, conditioned on these clusters. In order to adapt to previously unseen partners, we leverage a fixed-share regret minimization algorithm that infers and adjusts the estimated partner strategy dynamically. We assess our approach in a customized version of the Overcooked environment, posing a challenging cooperative cooking task that demands strong coordination across a wide range of possible strategies. Using an online user study, we show that our agent outperforms current baselines when working with unfamiliar human partners.
Problem

Research questions and friction points this paper is trying to address.

Adapting to diverse human partner strategies in real-time collaboration
Learning and categorizing latent strategy spaces from trajectory data
Enabling ad-hoc teamwork with unseen partners via dynamic strategy inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses variational autoencoder for latent strategy learning
Clusters data to identify diverse strategy types
Employs regret minimization for dynamic partner adaptation
🔎 Similar Papers
2022-08-09IEEE Transactions on Evolutionary ComputationCitations: 14