When Ethics and Payoffs Diverge: LLM Agents in Morally Charged Social Dilemmas

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study investigates the behavioral consistency of large language models (LLMs) in moral–self-interest conflicts arising in social dilemmas—such as the Prisoner’s Dilemma and Public Goods Game—where adherence to moral norms directly opposes individual gain. Method: We introduce MoralSim, a scalable, modular simulation platform for moralized social dilemmas, enabling three-dimensional evaluation across game structures, ethical frameworks (deontological, consequentialist, virtue-ethical), and dynamic opponent modeling. Using agent-based simulations with GPT-4, Claude, and Llama series models, we systematically assess cooperation rates and moral choice stability. Contribution/Results: We find that current LLMs exhibit pervasive inconsistency in moral–self-interest trade-offs: cooperation and moral alignment vary significantly with game type, moral framing, and opponent strategy. This work provides the first systematic evidence of structural fragility in LLM ethical alignment, establishing a novel methodology and empirical benchmark for evaluating and improving the moral robustness of AI social behavior.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have enabled their use in complex agentic roles, involving decision-making with humans or other agents, making ethical alignment a key AI safety concern. While prior work has examined both LLMs' moral judgment and strategic behavior in social dilemmas, there is limited understanding of how they act when moral imperatives directly conflict with rewards or incentives. To investigate this, we introduce Moral Behavior in Social Dilemma Simulation (MoralSim) and evaluate how LLMs behave in the prisoner's dilemma and public goods game with morally charged contexts. In MoralSim, we test a range of frontier models across both game structures and three distinct moral framings, enabling a systematic examination of how LLMs navigate social dilemmas in which ethical norms conflict with payoff-maximizing strategies. Our results show substantial variation across models in both their general tendency to act morally and the consistency of their behavior across game types, the specific moral framing, and situational factors such as opponent behavior and survival risks. Crucially, no model exhibits consistently moral behavior in MoralSim, highlighting the need for caution when deploying LLMs in agentic roles where the agent's"self-interest"may conflict with ethical expectations. Our code is available at https://github.com/sbackmann/moralsim.

Problem

Research questions and friction points this paper is trying to address.

How LLMs act when moral imperatives conflict with rewards

Evaluating LLM behavior in morally charged social dilemmas

Assessing consistency of moral behavior across different game scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MoralSim for LLM moral behavior analysis

Tests LLMs in prisoner's dilemma and public goods game

Evaluates moral consistency across game types and framings

🔎 Similar Papers

No similar papers found.