🤖 AI Summary
This study investigates how large language model (LLM)-based multi-agent systems achieve cooperation without explicit reward signals, in mixed-motive settings such as the common-pool resource (CPR) dilemma. Method: We propose a novel multi-agent framework integrating Ostrom’s design principles for collective governance, norm-based punishment mechanisms, cultural evolutionary modeling, and LLM-driven individual learning—crucially removing global payoff visibility to enable cooperation norms to emerge solely from local environmental feedback and social learning. Contribution/Results: Experiments successfully replicate key empirical features of human cooperative behavior in CPR games and reveal systematic cross-model differences in norm maintenance capability. Furthermore, we introduce the first standardized, quantifiable benchmark for evaluating the emergent capacity of LLMs to develop and sustain social norms—a foundational step toward assessing socio-cognitive alignment in artificial agents.
📝 Abstract
A growing body of multi-agent studies with Large Language Models (LLMs) explores how norms and cooperation emerge in mixed-motive scenarios, where pursuing individual gain can undermine the collective good. While prior work has explored these dynamics in both richly contextualized simulations and simplified game-theoretic environments, most LLM systems featuring common-pool resource (CPR) games provide agents with explicit reward functions directly tied to their actions. In contrast, human cooperation often emerges without full visibility into payoffs and population, relying instead on heuristics, communication, and punishment. We introduce a CPR simulation framework that removes explicit reward signals and embeds cultural-evolutionary mechanisms: social learning (adopting strategies and beliefs from successful peers) and norm-based punishment, grounded in Ostrom's principles of resource governance. Agents also individually learn from the consequences of harvesting, monitoring, and punishing via environmental feedback, enabling norms to emerge endogenously. We establish the validity of our simulation by reproducing key findings from existing studies on human behavior. Building on this, we examine norm evolution across a $2 imes2$ grid of environmental and social initialisations (resource-rich vs. resource-scarce; altruistic vs. selfish) and benchmark how agentic societies comprised of different LLMs perform under these conditions. Our results reveal systematic model differences in sustaining cooperation and norm formation, positioning the framework as a rigorous testbed for studying emergent norms in mixed-motive LLM societies. Such analysis can inform the design of AI systems deployed in social and organizational contexts, where alignment with cooperative norms is critical for stability, fairness, and effective governance of AI-mediated environments.