Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits

📅 2026-01-17

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This study addresses team regret control in cooperative multi-agent multi-objective stochastic bandits under adversarial corruption and limited verification. By introducing a communication–corruption coupling mechanism, it analyzes how different information-sharing protocols—raw samples, statistical summaries, or recommendations only—affect the efficacy of corruption under a global corruption budget Γ. The work establishes, for the first time, a threshold condition on verified observations that guarantees recoverability of learnability. Leveraging a vector-reward stochastic multi-armed bandit model, Lipschitz scalarization, and information-theoretic lower bounds, it proves that sharing summaries or recommendations achieves the centralized regret rate while incurring only an O(Γ) corruption penalty. Moreover, when the number of verified observations exceeds the derived threshold, the team regret becomes entirely independent of Γ.

Technology Category

Application Category

📝 Abstract

We study cooperative stochastic multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification. In each of $T$ rounds, each of $N$ agents selects an arm, the environment generates a clean reward vector, and an adversary perturbs the observed feedback subject to a global corruption budget $\Gamma$. Performance is measured by team regret under a coordinate-wise nondecreasing, $L$-Lipschitz scalarization $\phi$, covering linear, Chebyshev, and smooth monotone utilities. Our main contribution is a communication-corruption coupling: we show that a fixed environment-side budget $\Gamma$ can translate into an effective corruption level ranging from $\Gamma$ to $N\Gamma$, depending on whether agents share raw samples, sufficient statistics, or only arm recommendations. We formalize this via a protocol-induced multiplicity functional and prove regret bounds parameterized by the resulting effective corruption. As corollaries, raw-sample sharing can suffer an $N$-fold larger additive corruption penalty, whereas summary sharing and recommendation-only sharing preserve an unamplified $O(\Gamma)$ term and achieve centralized-rate team regret. We further establish information-theoretic limits, including an unavoidable additive $\Omega(\Gamma)$ penalty and a high-corruption regime $\Gamma=\Theta(NT)$ where sublinear regret is impossible without clean information. Finally, we characterize how a global budget $\nu$ of verified observations restores learnability. That is, verification is necessary in the high-corruption regime, and sufficient once it crosses the identification threshold, with certified sharing enabling the team's regret to become independent of $\Gamma$.

Problem

Research questions and friction points this paper is trying to address.

cooperative bandits

adversarial corruption

communication protocols

multi-objective rewards

verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

communication-corruption coupling

cooperative multi-armed bandits

adversarial corruption