🤖 AI Summary
This study investigates how strategic communication evolves when an advisor employs reinforcement learning—rather than full rationality—to generate messages within the Crawford-Sobel cheap talk framework. Integrating game-theoretic modeling with the reward-driven adaptation mechanisms of reinforcement learning, the authors construct and analytically examine a dynamic system. The results demonstrate that under aligned preferences, the learning process stably converges to high-information communication. In contrast, under misaligned preferences—where no static equilibrium exists—reinforcement learning induces persistent cyclical dynamics that sustain information transmission and mutual payoffs strictly exceeding those achievable under any static equilibrium. This work reveals that learning-driven communication can robustly yield efficient information transfer even without prior informational assumptions, thereby transcending the limitations of traditional equilibrium analysis.
📝 Abstract
We analyze strategic communication when advice is generated by a reinforcement-learning algorithm rather than by a fully rational sender. Building on the cheap-talk framework of Crawford and Sobel (1982), an advisor adapts its messages based on payoff feedback, while a decision maker best-responds. We provide a theoretical analysis of the long-run communication outcomes induced by such reward-driven adaptation. With aligned preferences, we establish that learning robustly leads to informative communication even from uninformative initial policies. With misaligned preferences, no stable outcome exists; instead, learning generates cycles that sustain highly informative communication and payoffs exceeding those of any static equilibrium.