Learning to Coordinate Under Threshold Rewards: A Cooperative Multi-Agent Bandit Framework

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies collaborative exploration in multi-agent bandits under two key challenges: unknown activation thresholds and hidden decoy arms—where rewards are triggered only when at least $k$ agents simultaneously select the same arm, and some arms yield no true reward. It is the first work to jointly model threshold uncertainty and decoy interference. We propose T-Coop-UCB, a decentralized algorithm built upon a distributed UCB framework that concurrently learns both the activation threshold and the true reward distributions of arms. Leveraging joint confidence intervals and an adaptive coalition formation mechanism, it avoids unproductive collaboration. Theoretical analysis establishes a sublinear regret bound. Experiments demonstrate that T-Coop-UCB significantly outperforms baselines in cumulative reward, coordination efficiency, and regret, closely approaching oracle performance—thereby validating the efficacy of joint threshold learning and decoy identification.

Technology Category

Application Category

📝 Abstract
Cooperative multi-agent systems often face tasks that require coordinated actions under uncertainty. While multi-armed bandit (MAB) problems provide a powerful framework for decentralized learning, most prior work assumes individually attainable rewards. We address the challenging setting where rewards are threshold-activated: an arm yields a payoff only when a minimum number of agents pull it simultaneously, with this threshold unknown in advance. Complicating matters further, some arms are decoys - requiring coordination to activate but yielding no reward - introducing a new challenge of wasted joint exploration. We introduce Threshold-Coop-UCB (T-Coop-UCB), a decentralized algorithm that enables agents to jointly learn activation thresholds and reward distributions, forming effective coalitions without centralized control. Empirical results show that T-Coop-UCB consistently outperforms baseline methods in cumulative reward, regret, and coordination metrics, achieving near-Oracle performance. Our findings underscore the importance of joint threshold learning and decoy avoidance for scalable, decentralized cooperation in complex multi-agent
Problem

Research questions and friction points this paper is trying to address.

Learning coordination under unknown threshold-activated rewards
Identifying decoy arms that require coordination but yield no reward
Decentralized multi-agent cooperation without centralized control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized threshold-activated reward learning
Coalition formation without centralized control
Joint threshold and decoy avoidance learning
🔎 Similar Papers
No similar papers found.