Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

📅 2025-07-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low exploration efficiency, reliance on hand-crafted individual rewards, and poor generalization under sparse team rewards in multi-agent reinforcement learning (MARL), this paper proposes an end-to-end reward shaping framework that integrates human expert knowledge. Methodologically, it combines multi-agent Q-learning, action-distribution modeling, and expert-preference integration. Its core contribution is the first differentiable and operable representation of human expert preference distributions in MARL, enabling the construction of an intrinsic individual reward mechanism that synergizes with the team reward. This allows agents to jointly optimize joint-action values while implicitly aligning with human expertise. Experiments on multiple sparse-reward benchmark tasks demonstrate significant improvements over state-of-the-art baselines, with enhanced knowledge transferability and cross-task reusability.

Technology Category

Application Category

📝 Abstract
Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. To tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner. LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addressing sparse rewards in multi-agent reinforcement learning
Integrating human expertise to guide agent exploration
Designing individual intrinsic rewards for efficient learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates human expertise into MARL end-to-end
Guides agents using action and human preference distributions
Designs intrinsic rewards via actionable Q-learning transformation
🔎 Similar Papers
No similar papers found.
X
Xuefei Wu
Department of Control Science and Intelligent Engineering, School of Management and Engineering, Nanjing University, Nanjing 210093, China
X
Xiao Yin
Department of Control Science and Intelligent Engineering, School of Management and Engineering, Nanjing University, Nanjing 210093, China
Yuanyang Zhu
Yuanyang Zhu
Nanjing University
Reinforcement learningInterpretabilityMachine learningAI4Science
Chunlin Chen
Chunlin Chen
Nanjing University
Reinforcement LearningQuantum ControlMobile Robotics