🤖 AI Summary
Indirect reciprocity and reputation-based cooperation critically depend on third-party moral judgment, yet the impact of large language models (LLMs) acting as societal “judges” on long-term human cooperative evolution remains unexplored.
Method: The authors develop an integrated framework combining 21 state-of-the-art LLMs with evolutionary game theory, standardized scenario-based evaluation, and goal-directed prompt engineering to assess LLM judgments across diverse cooperative contexts.
Contribution/Results: While LLMs exhibit high inter-model agreement in evaluating general cooperation, they show substantial disagreement—particularly regarding cooperation with low-reputation agents—significantly altering simulated population-level cooperation trajectories. Crucially, the study introduces controllable prompting to modulate LLMs’ implicit social norm preferences, empirically demonstrating targeted amplification or suppression of cooperative behavior. These findings establish LLMs not merely as analytical tools but as emergent institutional actors capable of shaping the evolution of human cooperation.
📝 Abstract
Humans increasingly rely on large language models (LLMs) to support decisions in social settings. Previous work suggests that such tools shape people's moral and political judgements. However, the long-term implications of LLM-based social decision-making remain unknown. How will human cooperation be affected when the assessment of social interactions relies on language models? This is a pressing question, as human cooperation is often driven by indirect reciprocity, reputations, and the capacity to judge interactions of others. Here, we assess how state-of-the-art LLMs judge cooperative actions. We provide 21 different LLMs with an extensive set of examples where individuals cooperate -- or refuse cooperating -- in a range of social contexts, and ask how these interactions should be judged. Furthermore, through an evolutionary game-theoretical model, we evaluate cooperation dynamics in populations where the extracted LLM-driven judgements prevail, assessing the long-term impact of LLMs on human prosociality. We observe a remarkable agreement in evaluating cooperation against good opponents. On the other hand, we notice within- and between-model variance when judging cooperation with ill-reputed individuals. We show that the differences revealed between models can significantly impact the prevalence of cooperation. Finally, we test prompts to steer LLM norms, showing that such interventions can shape LLM judgements, particularly through goal-oriented prompts. Our research connects LLM-based advices and long-term social dynamics, and highlights the need to carefully align LLM norms in order to preserve human cooperation.