Multi-Agent Reinforcement Learning with Submodular Reward

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses cooperative multi-agent reinforcement learning under submodular joint rewards, where the marginal gains of adding agents diminish. It establishes the first formal theoretical framework for this setting and proposes efficient policy optimization and learning algorithms. When the environment dynamics are known, a polynomial-time greedy policy optimization method is designed, achieving a 1/2 approximation ratio. In the unknown dynamics setting, an online learning algorithm integrating an upper confidence bound (UCB) mechanism is developed, yielding a 1/2-regret bound of $O(H^2 K S \sqrt{A T})$. By circumventing the curse of dimensionality in the joint policy space, this approach significantly improves sample efficiency and provides the first theoretically guaranteed solution for multi-agent coordination with submodular rewards.

Technology Category

Application Category

📝 Abstract
In this paper, we study cooperative multi-agent reinforcement learning (MARL) where the joint reward exhibits submodularity, which is a natural property capturing diminishing marginal returns when adding agents to a team. Unlike standard MARL with additive rewards, submodular rewards model realistic scenarios where agent contributions overlap (e.g., multi-drone surveillance, collaborative exploration). We provide the first formal framework for this setting and develop algorithms with provable guarantees on sample efficiency and regret bound. For known dynamics, our greedy policy optimization achieves a $1/2$-approximation with polynomial complexity in the number of agents $K$, overcoming the exponential curse of dimensionality inherent in joint policy optimization. For unknown dynamics, we propose a UCB-based learning algorithm achieving a $1/2$-regret of $O(H^2KS\sqrt{AT})$ over $T$ episodes.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning
Submodular Reward
Cooperative MARL
Diminishing Marginal Returns
Sample Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

submodular reward
multi-agent reinforcement learning
greedy policy optimization
regret bound
sample efficiency
🔎 Similar Papers
No similar papers found.
W
Wenjing Chen
Department of Computer Science and Engineering, Texas A&M University
C
Chengyuan Qian
Department of Computer Science and Engineering, Texas A&M University
Shuo Xing
Shuo Xing
Texas A&M University
Large Language ModelsNatural Language ProcessingMachine Learning
Yi Zhou
Yi Zhou
Texas A&M University
machine learningoptimizationsignal processing
V
Victoria Crawford
Department of Computer Science and Engineering, Texas A&M University