Competitive Multi-armed Bandit Games for Resource Sharing

📅 2025-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies online decision-making by non-myopic agents competing for scarce resources in an unknown stochastic environment, modeled as an $N$-player, $K$-arm competitive multi-armed bandit (CMAB) game. Due to reward collisions and dynamic payoffs, standard myopic analysis fails; selfish strategies yield unbounded price of anarchy (PoA). We first prove that pure information mechanisms cannot alleviate this inefficiency. Then, we propose the first combined information-and-side-payment (CISP) mechanism, which is incentive-compatible, ex-post budget-balanced, and achieves optimal PoA = 1. Theoretical analysis shows that our mechanism converges at rate $Oig(frac{K}{Neta^2}ln(K/delta)ig)$, strictly improving upon the $Omegaig(frac{K}{eta^2}ln(KN/delta)ig)$ lower bound for selfish strategies—thereby significantly enhancing both system efficiency and fairness.

Technology Category

Application Category

📝 Abstract
In modern resource-sharing systems, multiple agents access limited resources with unknown stochastic conditions to perform tasks. When multiple agents access the same resource (arm) simultaneously, they compete for successful usage, leading to contention and reduced rewards. This motivates our study of competitive multi-armed bandit (CMAB) games. In this paper, we study a new N-player K-arm competitive MAB game, where non-myopic players (agents) compete with each other to form diverse private estimations of unknown arms over time. Their possible collisions on same arms and time-varying nature of arm rewards make the policy analysis more involved than existing studies for myopic players. We explicitly analyze the threshold-based structures of social optimum and existing selfish policy, showing that the latter causes prolonged convergence time $Omega(frac{K}{eta^2}ln({frac{KN}{delta}}))$, while socially optimal policy with coordinated communication reduces it to $mathcal{O}(frac{K}{Neta^2}ln{(frac{K}{delta})})$. Based on the comparison, we prove that the competition among selfish players for the best arm can result in an infinite price of anarchy (PoA), indicating an arbitrarily large efficiency loss compared to social optimum. We further prove that no informational (non-monetary) mechanism (including Bayesian persuasion) can reduce the infinite PoA, as the strategic misreporting by non-myopic players undermines such approaches. To address this, we propose a Combined Informational and Side-Payment (CISP) mechanism, which provides socially optimal arm recommendations with proper informational and monetary incentives to players according to their time-varying private beliefs. Our CISP mechanism keeps ex-post budget balanced for social planner and ensures truthful reporting from players, achieving the minimum PoA=1 and same convergence time as social optimum.
Problem

Research questions and friction points this paper is trying to address.

Analyzes competitive multi-armed bandit games for resource sharing
Studies non-myopic players' collisions and time-varying arm rewards
Proposes mechanism to reduce infinite price of anarchy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Competitive multi-armed bandit game analysis
Threshold-based social optimum policy
Combined Informational and Side-Payment mechanism
🔎 Similar Papers
No similar papers found.