Sustaining Cooperation in Populations Guided by AI: A Folk Theorem for LLMs

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates how cooperation can be sustained in multi-agent systems guided by large language models (LLMs) with heterogeneous individual incentives. By constructing an LLM-driven meta-game framework that integrates repeated interaction settings with ε-equilibrium theory, the work proposes and proves a novel folk theorem tailored to LLM-based agents: even when agents cannot identify the specific LLM employed by their counterparts and can only indirectly observe behavior, all feasible and individually rational outcomes can still be approximated as equilibrium outcomes. This result relaxes the stringent identification and observability assumptions of classical folk theorems, enabling limited cooperation in one-shot interactions and supporting a broad spectrum of cooperative equilibria under repeated engagement, thereby significantly expanding the theoretical foundations for AI-mediated coordination in multi-agent systems.

📝 Abstract

Large language models (LLMs) are increasingly used to provide instructions to many agents who interact with one another. Such shared reliance couples agents who appear to act independently: they may in fact be guided by a common model. This coupling can change the prospects for cooperation among agents with misaligned incentives. We study settings in which multiple LLMs each advise a population of clients who participate in instances of an underlying game, creating strategic interaction at the level of the LLMs themselves. This induces a meta-game among the LLMs, mediated through clients. We first analyze the one-shot setting, where shared instructions can change equilibrium behavior only when an LLM may influence more than one role in the same interaction; in such cases, cooperation may emerge, and the effect of client share can be beneficial, harmful, or non-monotone, depending on the base game. Our main result concerns the repeated setting. We prove a folk theorem for LLMs: despite indirect observation and the clients' inability to identify which LLM advised their opponents, all feasible and individually rational outcomes can be sustained as $\varepsilon$-equilibria. The result does not follow from the standard folk theorem and requires new proof techniques. Together, these results show that shared LLM guidance can sustain cooperation among populations of agents even when the underlying incentives are misaligned.

Problem

Research questions and friction points this paper is trying to address.

cooperation

large language models

multi-agent systems

game theory

incentive misalignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

folk theorem

large language models

meta-game

ε-equilibrium

multi-agent cooperation

🔎 Similar Papers

No similar papers found.