🤖 AI Summary
This paper addresses the distributed learning problem for $N$ players competing across $K$ parallel Tug-of-War (ToW) games: at each step, each player selects exactly one game, and its reward depends on the joint actions of all players in that game—forming a “Meta-ToW” system applicable to power control, task allocation, and sensor activation. To tackle high strategic coupling and communication constraints, we propose the Meta Tug-of-Peace algorithm, which enables decentralized game-switching decisions via low-frequency, 1-bit broadcast signals and updates action policies using stochastic approximation. We prove that the algorithm converges almost surely to an approximate Nash equilibrium satisfying prescribed quality-of-service (QoS) requirements. Extensive simulations demonstrate its efficacy in achieving system-wide equilibrium and QoS guarantees across diverse scenarios, while significantly reducing communication overhead compared to conventional approaches.
📝 Abstract
Consider N players and K games taking place simultaneously. Each of these games is modeled as a Tug-of-War (ToW) game where increasing the action of one player decreases the reward for all other players. Each player participates in only one game at any given time. At each time step, a player decides the game in which they wish to participate in and the action they take in that game. Their reward depends on the actions of all players that are in the same game. This system of K games is termed `Meta Tug-of-War' (Meta-ToW) game. These games can model scenarios such as power control, distributed task allocation, and activation in sensor networks. We propose the Meta Tug-of-Peace algorithm, a distributed algorithm where the action updates are done using a simple stochastic approximation algorithm, and the decision to switch games is made using an infrequent 1-bit communication between the players. We prove that in Meta-ToW games, our algorithm converges to an equilibrium that satisfies a target Quality of Service reward vector for the players. We then demonstrate the efficacy of our algorithm through simulations for the scenarios mentioned above.