No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling

📅 2024-05-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates whether unintentional collusion arises when two players independently employ Thompson sampling in repeated blind games with unknown payoff matrices. Under mild assumptions on the payoff matrix, we prove that their strategies converge almost surely (with probability one) to a Nash equilibrium—thereby precluding algorithmic collusion. To establish this result, we develop the first sample-path convergence analysis framework applicable to settings with infrequent parameter updates and non-Lipschitz dynamics, extending beyond the scope of classical stochastic approximation methods. Integrating multi-armed bandit theory, Bayesian adaptive decision-making, and game-theoretic equilibrium analysis, we rigorously derive asymptotic rationality guarantees for Thompson sampling in non-cooperative stochastic games. Our theoretical findings provide foundational support for algorithmic fairness and interpretability in decentralized learning systems.

Technology Category

Application Category

📝 Abstract
When two players are engaged in a repeated game with unknown payoff matrices, they may be completely unaware of the existence of each other and use multi-armed bandit algorithms to choose the actions, which is referred to as the ``blindfolded game'' in this paper. We show that when the players use Thompson sampling, the game dynamics converges to the Nash equilibrium under a mild assumption on the payoff matrices. Therefore, algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies. To prove the convergence result, we find that the framework developed in stochastic approximation doesn't apply, because of the sporadic and infrequent updates of the inferior actions and the lack of Lipschitz continuity. We develop a novel sample-path-wise approach to show the convergence.
Problem

Research questions and friction points this paper is trying to address.

Thompson Sampling's susceptibility to algorithmic collusion
Convergence conditions for Nash equilibrium in repeated games
Collusive outcomes when payoff matrix assumptions fail
Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson sampling prevents algorithmic collusion convergence
Novel sample-path-wise approach proves Nash equilibrium convergence
Collusive outcomes possible when payoff assumption violated
🔎 Similar Papers
Ningyuan Chen
Ningyuan Chen
Department of Management, UTM & Rotman School of Management, University of Toronto
Revenue ManagementOnline LearningOperations ManagementBusiness Analytics
X
Xuefeng Gao
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
Y
Yi Xiong
School of Information Management and Engineering, Shanghai University of Finance and Economics