PIMAEX: Multi-Agent Exploration through Peer Incentivization

📅 2025-01-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses two key challenges in multi-agent reinforcement learning (MARL): insufficient exploration and difficult credit assignment. To this end, we propose PIMAEX, a peer-inspired intrinsic motivation mechanism that jointly models influence and curiosity. Its core innovation lies in the first integration of influence-aware behavioral prediction with curiosity-driven exploration, yielding a scalable, peer-based intrinsic reward function that encourages agents to actively shape peers’ behaviors to discover novel states. Furthermore, we introduce PIMAEX-Communication, a communication-augmented training framework that enhances coordinated exploration under partial observability (e.g., in Consume/Explore environments). Evaluated on tasks featuring deceptive rewards and severe credit assignment difficulties, PIMAEX significantly improves both exploration efficiency and final policy performance, consistently outperforming state-of-the-art baselines across diverse MARL benchmarks.

Technology Category

Application Category

📝 Abstract

While exploration in single-agent reinforcement learning has been studied extensively in recent years, considerably less work has focused on its counterpart in multi-agent reinforcement learning. To address this issue, this work proposes a peer-incentivized reward function inspired by previous research on intrinsic curiosity and influence-based rewards. The extit{PIMAEX} reward, short for Peer-Incentivized Multi-Agent Exploration, aims to improve exploration in the multi-agent setting by encouraging agents to exert influence over each other to increase the likelihood of encountering novel states. We evaluate the extit{PIMAEX} reward in conjunction with extit{PIMAEX-Communication}, a multi-agent training algorithm that employs a communication channel for agents to influence one another. The evaluation is conducted in the extit{Consume/Explore} environment, a partially observable environment with deceptive rewards, specifically designed to challenge the exploration vs. exploitation dilemma and the credit-assignment problem. The results empirically demonstrate that agents using the extit{PIMAEX} reward with extit{PIMAEX-Communication} outperform those that do not.

Problem

Research questions and friction points this paper is trying to address.

Multi-Robot Learning

Challenging Environment Exploration

Credit Assignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

PIMAEX Reward Method

PIMAEX-Communication Training Method

Multi-Robot Exploration Efficiency

🔎 Similar Papers

No similar papers found.

Anthropic

$500,000—$850,000 USD

San Francisco, CA, USA

AI Research Scientist - FAIR Social Intelligence