🤖 AI Summary
This work addresses the challenge of free-riding behavior in strategic agents within multi-agent Bayesian multi-armed bandit settings, which often impedes collaborative exploration. Focusing on long-term participation scenarios without monetary transfers, the paper introduces CAOS—the first incentive-compatible, purely information-sharing mechanism—that renders efficient collaboration a Nash equilibrium. By integrating game-theoretic principles with the Bayesian multi-armed bandit framework, CAOS sustains persistent cooperative exploration through information exchange alone. Both theoretical analysis and empirical experiments demonstrate that CAOS achieves regret performance nearly matching that of fully cooperative systems, establishing that purely informational incentives are sufficient to support highly effective multi-agent collaborative learning.
📝 Abstract
We study collaborative learning in multi-agent Bayesian bandit problems, where strategic agents collectively solve the same bandit instance. While multiple agents can accelerate learning by sharing information, strategic agents might prefer to free-ride and avoid exploration. We consider a setting with persistent agents that participate in multiple time periods. This is in contrast to most previous works on incentives in multi-agent MAB, which assume short-lived agents, namely each agent has a single decision to make and optimizes their expected reward in that single decision. As in the multi-agent MAB model with incentives, our model does not have monetary transfers, and the only incentives are through information sharing.
We propose \texttt{CAOS}, a mechanism that sustains collaboration as a Nash equilibrium while achieving strong regret guarantees. Our results demonstrate that collaborative exploration can be sustained purely through information sharing, achieving performance close to that of fully cooperative systems despite strategic behavior.