Multi-Agent Reinforcement Learning with Submodular Reward

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses cooperative multi-agent reinforcement learning under submodular joint rewards, where the marginal gains of adding agents diminish. It establishes the first formal theoretical framework for this setting and proposes efficient policy optimization and learning algorithms. When the environment dynamics are known, a polynomial-time greedy policy optimization method is designed, achieving a 1/2 approximation ratio. In the unknown dynamics setting, an online learning algorithm integrating an upper confidence bound (UCB) mechanism is developed, yielding a 1/2-regret bound of $O(H^2 K S \sqrt{A T})$. By circumventing the curse of dimensionality in the joint policy space, this approach significantly improves sample efficiency and provides the first theoretically guaranteed solution for multi-agent coordination with submodular rewards.

Technology Category

Application Category

📝 Abstract

In this paper, we study cooperative multi-agent reinforcement learning (MARL) where the joint reward exhibits submodularity, which is a natural property capturing diminishing marginal returns when adding agents to a team. Unlike standard MARL with additive rewards, submodular rewards model realistic scenarios where agent contributions overlap (e.g., multi-drone surveillance, collaborative exploration). We provide the first formal framework for this setting and develop algorithms with provable guarantees on sample efficiency and regret bound. For known dynamics, our greedy policy optimization achieves a $1/2$-approximation with polynomial complexity in the number of agents $K$, overcoming the exponential curse of dimensionality inherent in joint policy optimization. For unknown dynamics, we propose a UCB-based learning algorithm achieving a $1/2$-regret of $O(H^2KS\sqrt{AT})$ over $T$ episodes.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning

Submodular Reward

Cooperative MARL

Diminishing Marginal Returns

Sample Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

submodular reward

multi-agent reinforcement learning

greedy policy optimization