🤖 AI Summary
This paper addresses strategic manipulation in Bayesian multi-armed bandits (MAB), where agents can register or replicate arms (“shadow arms”) to misreport their true reward distributions—posing critical challenges to incentive compatibility and fairness when agents only know their private prior reward distributions.
Method: We establish the first mechanism design framework robust to arm replication. We derive necessary and sufficient conditions for replication-proofness in both single- and multi-agent settings, model the problem as a Bayesian game, and design a replication-proof algorithm with rigorous theoretical guarantees. We further introduce a novel analytical paradigm—comparing expected regret across multiple problem instances—to evaluate robustness.
Contribution/Results: Our algorithm achieves sublinear regret—matching the benchmark of Shin et al. (2022)—and is provably replication-proof under arbitrary single-agent and multi-agent configurations. This work provides foundational theoretical support for trustworthy online learning mechanisms.
📝 Abstract
We study the problem of designing replication-proof bandit mechanisms when agents strategically register or replicate their own arms to maximize their payoff. Specifically, we consider Bayesian agents who only know the distribution from which their own arms' mean rewards are sampled, unlike the original setting of by Shin et al. 2022. Interestingly, with Bayesian agents in stark contrast to the previous work, analyzing the replication-proofness of an algorithm becomes significantly complicated even in a single-agent setting. We provide sufficient and necessary conditions for an algorithm to be replication-proof in the single-agent setting, and present an algorithm that satisfies these properties. These results center around several analytical theorems that focus on emph{comparing the expected regret of multiple bandit instances}, and therefore might be of independent interest since they have not been studied before to the best of our knowledge. We expand this result to the multi-agent setting, and provide a replication-proof algorithm for any problem instance. We finalize our result by proving its sublinear regret upper bound which matches that of Shin et al. 2022.