Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address low exploration efficiency, reliance on hand-crafted individual rewards, and poor generalization under sparse team rewards in multi-agent reinforcement learning (MARL), this paper proposes an end-to-end reward shaping framework that integrates human expert knowledge. Methodologically, it combines multi-agent Q-learning, action-distribution modeling, and expert-preference integration. Its core contribution is the first differentiable and operable representation of human expert preference distributions in MARL, enabling the construction of an intrinsic individual reward mechanism that synergizes with the team reward. This allows agents to jointly optimize joint-action values while implicitly aligning with human expertise. Experiments on multiple sparse-reward benchmark tasks demonstrate significant improvements over state-of-the-art baselines, with enhanced knowledge transferability and cross-task reusability.

Technology Category

Application Category

📝 Abstract

Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing sparse rewards in multi-agent reinforcement learning

Integrating human expertise to guide agent exploration

Designing individual intrinsic rewards for efficient learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates human expertise into MARL end-to-end

Guides agents using action and human preference distributions

Designs intrinsic rewards via actionable Q-learning transformation

🔎 Similar Papers

BCR-DRL: Behavior- and Context-aware Reward for Deep Reinforcement Learning in Human-AI Coordination