Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing deep hierarchical reinforcement learning methods, which often ignore partner behavior and consequently learn skills based on spurious correlations, leading to poor generalization when collaborating with partners employing diverse or dynamic strategies. To overcome this, the paper introduces Partner-Aware Skill Discovery (PASD), a novel framework that explicitly incorporates partner behavior into the skill learning process. PASD employs a contrastive intrinsic reward mechanism grounded in partner interactions, enabling skill representations to align with similar behaviors while distinguishing between differing partner policies, thereby mitigating shortcut learning. Experimental results on the Overcooked-AI benchmark and human-agent models demonstrate that PASD significantly outperforms current approaches, exhibiting superior cross-partner generalization and robustness in collaborative settings.
📝 Abstract
Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behaviors. Conventional Deep Hierarchical Reinforcement Learning (DHRL) methods focus on agent-centric rewards and overlook partner behavior, leading to shortcut learning, where skills exploit spurious information instead of adapting to partners' dynamic behaviors. This limitation undermines agents' ability to adapt and coordinate effectively with novel partners. We introduce Partner-Aware Skill Discovery (PASD), a DHRL framework that learns skills conditioned on partner behavior. PASD introduces a contrastive intrinsic reward to capture patterns emerging from partner interactions, aligning skill representations across similar partners while maintaining discriminability across diverse strategies. By structuring the skill space based on partner interactions, this approach mitigates shortcut learning and promotes behavioral consistency, enabling robust and adaptive coordination. We extensively evaluate PASD in the Overcooked-AI benchmark with a diverse population of partners characterized by varying skill levels and play styles. We further evaluate the approach with human proxy models trained from human-human gameplay trajectories. PASD consistently outperforms existing population-based and hierarchical baselines, demonstrating transferable skill learning that generalizes across a wide range of partner behaviors. Analysis of learned skill representations shows that PASD adapts effectively to diverse partner behaviors, highlighting its robustness in human-AI collaboration.
Problem

Research questions and friction points this paper is trying to address.

human-AI collaboration
multi-agent collaboration
shortcut learning
partner adaptation
hierarchical reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partner-Aware Skill Discovery
Hierarchical Reinforcement Learning
Contrastive Intrinsic Reward
Human-AI Collaboration
Skill Generalization