Nested Training for Mutual Adaptation in Human-AI Teaming

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

Existing human-AI collaboration methods struggle to model humans’ dynamic adaptation to AI strategies, leading to poor generalization when interacting with new partners. This work addresses this limitation by formulating human-AI collaboration as an interactive partially observable Markov decision process (I-POMDP), explicitly incorporating human adaptability into the state space for the first time. The authors propose a nested training framework in which a high-level agent learns by interacting with a pretrained, adaptive low-level agent, thereby avoiding the implicit coordination biases inherent in co-training approaches. Evaluated in the multi-episode, forced-cooperation Overcooked environment, the proposed method significantly outperforms current baselines in both task success rate and real-time adaptability, demonstrating markedly improved generalization to previously unseen adaptive human partners.

Technology Category

Application Category

📝 Abstract

Mutual adaptation is a central challenge in human--AI teaming, as humans naturally adjust their strategies in response to a robot's policy. Existing approaches aim to improve diversity in training partners to approximate human behavior, but these partners are static and fail to capture adaptive behavior of humans. Exposing robots to adaptive behaviors is critical, yet when both agents learn simultaneously in a multi-agent setting, they often converge to opaque implicit coordination strategies that only work with the agents they were co-trained with. Such agents fail to generalize when paired with new partners. In order to capture the adaptive behavior of humans, we model the human-robot teaming scenario as an Interactive Partially Observable Markov Decision Process (I-POMDP), explicitly modeling human adaptation as part of the state. We propose a nested training regime to approximately learn the solution to a finite-level I-POMDP. In this framework, agents at each level are trained against adaptive agents from the level below. This ensures that the ego agent is exposed to adaptive behavior during training while avoiding the emergence of implicit coordination strategies, since the training partners are not themselves learning. We train our method in a multi-episode, required cooperation setup in the Overcooked domain, comparing it against several baseline agents designed for human-robot teaming. We evaluate the performance of our agent when paired with adaptive partners that were not seen during training. Our results demonstrate that our agent not only achieves higher task performance with these adaptive partners but also exhibits significantly greater adaptability during team interactions.

Problem

Research questions and friction points this paper is trying to address.

mutual adaptation

human-AI teaming

adaptive behavior

generalization

interactive POMDP

Innovation

Methods, ideas, or system contributions that make the work stand out.

nested training

mutual adaptation

I-POMDP