Bandit Social Learning with Exploration Episodes

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates collective exploration in social learning systems composed of self-interested agents interacting with multi-armed bandits. Despite each agent possessing intrinsic motivation to explore, the collective exploration typically fails, leading to linear-in-time growth of Bayesian regret. By modeling each agent as controlling a contiguous block of decisions—termed an “exploration segment”—and integrating Bayesian regret analysis with social learning dynamics, the authors theoretically establish that aggregate exploration generally fails to converge effectively under broad conditions. This result holds across multiple utility aggregation functions, including min and max, thereby revealing fundamental limitations of organic exploration mechanisms in decentralized settings. The findings underscore the necessity of external intervention to enhance the efficiency of collective learning in such systems.

Technology Category

Application Category

📝 Abstract

We study a stylized social learning dynamics where self-interested agents collectively follow a simple multi-armed bandit protocol. Each agent controls an ``episode": a short sequence of consecutive decisions. Motivating applications include users repeatedly interacting with an AI, or repeatedly shopping at a marketplace. While agents are incentivized to explore within their respective episodes, we show that the aggregate exploration fails: e.g., its Bayesian regret grows linearly over time. In fact, such failure is a (very) typical case, not just a worst-case scenario. This conclusion persists even if an agent's per-episode utility is some fixed function of the per-round outcomes: e.g., $\min$ or $\max$, not just the sum. Thus, externally driven exploration is needed even when some amount of exploration happens organically.

Problem

Research questions and friction points this paper is trying to address.

social learning

multi-armed bandit

exploration failure

Bayesian regret

self-interested agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-armed bandits

social learning

exploration episodes