🤖 AI Summary
This paper investigates whether unintentional collusion arises when two players independently employ Thompson sampling in repeated blind games with unknown payoff matrices. Under mild assumptions on the payoff matrix, we prove that their strategies converge almost surely (with probability one) to a Nash equilibrium—thereby precluding algorithmic collusion. To establish this result, we develop the first sample-path convergence analysis framework applicable to settings with infrequent parameter updates and non-Lipschitz dynamics, extending beyond the scope of classical stochastic approximation methods. Integrating multi-armed bandit theory, Bayesian adaptive decision-making, and game-theoretic equilibrium analysis, we rigorously derive asymptotic rationality guarantees for Thompson sampling in non-cooperative stochastic games. Our theoretical findings provide foundational support for algorithmic fairness and interpretability in decentralized learning systems.
📝 Abstract
When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence.