🤖 AI Summary
This study investigates collective exploration in social learning systems composed of self-interested agents interacting with multi-armed bandits. Despite each agent possessing intrinsic motivation to explore, the collective exploration typically fails, leading to linear-in-time growth of Bayesian regret. By modeling each agent as controlling a contiguous block of decisions—termed an “exploration segment”—and integrating Bayesian regret analysis with social learning dynamics, the authors theoretically establish that aggregate exploration generally fails to converge effectively under broad conditions. This result holds across multiple utility aggregation functions, including min and max, thereby revealing fundamental limitations of organic exploration mechanisms in decentralized settings. The findings underscore the necessity of external intervention to enhance the efficiency of collective learning in such systems.
📝 Abstract
We study a stylized social learning dynamics where self-interested agents collectively follow a simple multi-armed bandit protocol. Each agent controls an ``episode": a short sequence of consecutive decisions. Motivating applications include users repeatedly interacting with an AI, or repeatedly shopping at a marketplace. While agents are incentivized to explore within their respective episodes, we show that the aggregate exploration fails: e.g., its Bayesian regret grows linearly over time. In fact, such failure is a (very) typical case, not just a worst-case scenario. This conclusion persists even if an agent's per-episode utility is some fixed function of the per-round outcomes: e.g., $\min$ or $\max$, not just the sum. Thus, externally driven exploration is needed even when some amount of exploration happens organically.