Multi-Agent Combinatorial-Multi-Armed-Bandit framework for the Submodular Welfare Problem under Bandit Feedback

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the problem of submodular welfare maximization among multiple non-communicating agents, each endowed with a monotone submodular utility function, under the constraint that only bandit feedback is available. To tackle this challenge, the authors propose a Multi-Agent Combinatorial Multi-Armed Bandit (MA-CMAB) framework that integrates exploration–exploitation strategies, randomized allocation mechanisms, and submodular optimization techniques to enable efficient decision-making under shared allocation constraints. This work represents the first extension of submodular welfare maximization to a multi-agent bandit setting with shared constraints and establishes the first theoretical regret guarantee in this context: an upper regret bound of Õ(T^{2/3}) relative to a (1−1/e)-approximation benchmark.

Technology Category

Application Category

📝 Abstract
We study the \emph{Submodular Welfare Problem} (SWP), where items are partitioned among agents with monotone submodular utilities to maximize the total welfare under \emph{bandit feedback}. Classical SWP assumes full value-oracle access, achieving $(1-1/e)$ approximations via continuous-greedy algorithms. We extend this to a \emph{multi-agent combinatorial bandit} framework (\textsc{MA-CMAB}), where actions are partitions under full-bandit feedback with non-communicating agents. Unlike prior single-agent or separable multi-agent CMAB models, our setting couples agents through shared allocation constraints. We propose an explore-then-commit strategy with randomized assignments, achieving $\tilde{\mathcal{O}}(T^{2/3})$ regret against a $(1-1/e)$ benchmark, the first such guarantee for partition-based submodular welfare problem under bandit feedback.
Problem

Research questions and friction points this paper is trying to address.

Submodular Welfare Problem
Multi-Agent Combinatorial Bandits
Bandit Feedback
Monotone Submodular Utilities
Allocation Constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Combinatorial Bandits
Submodular Welfare Problem
Bandit Feedback
Explore-Then-Commit
Regret Bound
🔎 Similar Papers
2024-05-25arXiv.orgCitations: 1