PIMAEX: Multi-Agent Exploration through Peer Incentivization

πŸ“… 2025-01-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses two key challenges in multi-agent reinforcement learning (MARL): insufficient exploration and difficult credit assignment. To this end, we propose PIMAEX, a peer-inspired intrinsic motivation mechanism that jointly models influence and curiosity. Its core innovation lies in the first integration of influence-aware behavioral prediction with curiosity-driven exploration, yielding a scalable, peer-based intrinsic reward function that encourages agents to actively shape peers’ behaviors to discover novel states. Furthermore, we introduce PIMAEX-Communication, a communication-augmented training framework that enhances coordinated exploration under partial observability (e.g., in Consume/Explore environments). Evaluated on tasks featuring deceptive rewards and severe credit assignment difficulties, PIMAEX significantly improves both exploration efficiency and final policy performance, consistently outperforming state-of-the-art baselines across diverse MARL benchmarks.

Technology Category

Application Category

πŸ“ Abstract
While exploration in single-agent reinforcement learning has been studied extensively in recent years, considerably less work has focused on its counterpart in multi-agent reinforcement learning. To address this issue, this work proposes a peer-incentivized reward function inspired by previous research on intrinsic curiosity and influence-based rewards. The extit{PIMAEX} reward, short for Peer-Incentivized Multi-Agent Exploration, aims to improve exploration in the multi-agent setting by encouraging agents to exert influence over each other to increase the likelihood of encountering novel states. We evaluate the extit{PIMAEX} reward in conjunction with extit{PIMAEX-Communication}, a multi-agent training algorithm that employs a communication channel for agents to influence one another. The evaluation is conducted in the extit{Consume/Explore} environment, a partially observable environment with deceptive rewards, specifically designed to challenge the exploration vs. exploitation dilemma and the credit-assignment problem. The results empirically demonstrate that agents using the extit{PIMAEX} reward with extit{PIMAEX-Communication} outperform those that do not.
Problem

Research questions and friction points this paper is trying to address.

Multi-Robot Learning
Challenging Environment Exploration
Credit Assignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

PIMAEX Reward Method
PIMAEX-Communication Training Method
Multi-Robot Exploration Efficiency
πŸ”Ž Similar Papers
No similar papers found.
M
Michael Kolle
Institute of Informatics, LMU Munich, Munich, Germany
Johannes Tochtermann
Johannes Tochtermann
LMU Munich
Reinforcement Learning
J
Julian Schonberger
Institute of Informatics, LMU Munich, Munich, Germany
Gerhard Stenzel
Gerhard Stenzel
Phd Student, LMU Munich
quantum machine learningoptimizationcomputer science
Philipp Altmann
Philipp Altmann
LMU Munich
Collective IntelligenceReinforcement LearningQuantum Machine LearningSurrogate Modeling
C
Claudia Linnhoff-Popien
Institute of Informatics, LMU Munich, Munich, Germany