BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination

📅 2024-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low learning efficiency in deep reinforcement learning (DRL) agents for human-robot collaboration—caused by sparse extrinsic rewards and unpredictable human behavior—this paper proposes a behavior- and context-aware dual intrinsic reward mechanism. Our method introduces, for the first time, a synergistic intrinsic reward that jointly models human motivation and AI-driven self-motivation, coupled with a learning-progress-sensitive dynamic context-weighting strategy to simultaneously mitigate insufficient exploration and exploitation bias. The framework integrates intrinsic motivation modeling, context-aware weight optimization, and logarithmic sparse-reward capture. Evaluated on the Overcooked benchmark, our approach achieves approximately 20% higher cumulative sparse reward and reduces policy convergence time by 67% compared to state-of-the-art methods, significantly improving both learning efficiency and robustness of collaborative policies.

Technology Category

Application Category

📝 Abstract
Deep reinforcement Learning (DRL) offers a powerful framework for training AI agents to coordinate with human partners. However, DRL faces two critical challenges in human-AI coordination (HAIC): sparse rewards and unpredictable human behaviors. These challenges significantly limit DRL to identify effective coordination policies, due to its impaired capability of optimizing exploration and exploitation. To address these limitations, we propose an innovative behavior- and context-aware reward (BCR) for DRL, which optimizes exploration and exploitation by leveraging human behaviors and contextual information in HAIC. Our BCR consists of two components: (i)~Novel dual intrinsic rewards to enhance exploration. This scheme composes an AI self-motivated intrinsic reward and a human-motivated intrinsic reward, which are designed to increase the capture of sparse rewards by a logarithmic-based strategy; and (ii)~New context-aware weights for the designed rewards to improve exploitation. This mechanism helps the AI agent prioritize actions that better coordinate with the human partner by utilizing contextual information that can reflect the evolution of learning in HAIC. Extensive simulations in the Overcooked environment demonstrate that our approach can increase the cumulative sparse rewards by approximately 20% and reduce the convergence time by about 67% compared to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

Sparse Rewards
Unpredictable Human Behavior
Deep Reinforcement Learning in Human-Robot Collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

BCR-DRL
Dual-source Reward
Adaptive Reward Adjustment
🔎 Similar Papers
No similar papers found.
X
Xin Hao
School of Information Technology, Deakin University, Victoria, Australia
Bahareh Nakisa
Bahareh Nakisa
Senior Lecturer in AI, Deakin University
Human-Machine TeamingTrustAffective ComputingEthical AICognition
M
Mohmmad Naim Rastgoo
School of Computer and Mathematical Sciences, University of Adelaide, South Australia, Australia
Richard Dazeley
Richard Dazeley
Professor of Artificial Intelligence and Machine Learning, School of Information Technology, Deakin
Multi-objective Reinforcement LearningAI SafetyExplainable AIAI alignmentReinforcement Learning
G
Gaoyang Pang
School of Information Technology, Deakin University, Victoria, Australia