π€ AI Summary
This paper studies a multi-player stochastic reward resource fair-sharing game, where resource rewards follow an unknown distribution and are equally split among players selecting the same resource. We consider two settings: (i) a one-shot worst-case optimization with known reward means, and (ii) an online multi-period learning setting with unknown means. We formally model the worst-case fair-sharing game for the first time and propose Robust-UCBβthe first algorithm tailored to this setting. Theoretically, we derive a closed-form solution for the one-shot setting and establish an $O(sqrt{T})$ worst-case regret bound in the online setting, significantly outperforming baseline strategies. Our analysis uncovers non-intuitive Nash equilibrium structures and unifies game theory, robust optimization, and online learning. The work establishes a novel paradigm for fair resource allocation under incomplete information.
π Abstract
This paper considers a multi-player resource-sharing game with a fair reward allocation model. Multiple players choose from a collection of resources. Each resource brings a random reward equally divided among the players who choose it. We consider two settings. The first setting is a one-slot game where the mean rewards of the resources are known to all the players, and the objective of player 1 is to maximize their worst-case expected utility. Certain special cases of this setting have explicit solutions. These cases provide interesting yet non-intuitive insights into the problem. The second setting is an online setting, where the game is played over a finite time horizon, where the mean rewards are unknown to the first player. Instead, the first player receives, as feedback, the rewards of the resources they chose after the action. We develop a novel Upper Confidence Bound (UCB) algorithm that minimizes the worst-case regret of the first player using the feedback received.