Are Language Models Sensitive to Morally Irrelevant Distractors?

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) exhibit unstable moral judgments in high-stakes scenarios due to morally irrelevant contextual perturbations. Inspired by situationalism in moral psychology, we construct a multimodal dataset comprising 60 morally irrelevant distractors and systematically inject them into established moral benchmarks to evaluate the sensitivity of mainstream LLMs to such contextual interference. Experimental results demonstrate that even in low-ambiguity moral scenarios, these distractors can shift model judgments by over 30%. This work provides the first systematic evidence of the fragility of LLMs’ moral reasoning, challenging the prevailing assumption that they possess stable moral preferences, and underscores the need for future moral modeling approaches to more rigorously integrate contextual factors.

Technology Category

Application Category

📝 Abstract

With the rapid development and uptake of large language models (LLMs) across high-stakes settings, it is increasingly important to ensure that LLMs behave in ways that align with human values. Existing moral benchmarks prompt LLMs with value statements, moral scenarios, or psychological questionnaires, with the implicit underlying assumption that LLMs report somewhat stable moral preferences. However, moral psychology research has shown that human moral judgements are sensitive to morally irrelevant situational factors, such as smelling cinnamon rolls or the level of ambient noise, thereby challenging moral theories that assume the stability of human moral judgements. Here, we draw inspiration from this"situationist"view of moral psychology to evaluate whether LLMs exhibit similar cognitive moral biases to humans. We curate a novel multimodal dataset of 60"moral distractors"from existing psychological datasets of emotionally-valenced images and narratives which have no moral relevance to the situation presented. After injecting these distractors into existing moral benchmarks to measure their effects on LLM responses, we find that moral distractors can shift the moral judgements of LLMs by over 30% even in low-ambiguity scenarios, highlighting the need for more contextual moral evaluations and more nuanced cognitive moral modeling of LLMs.

Problem

Research questions and friction points this paper is trying to address.

moral judgement

language models

morally irrelevant distractors

situationist

moral benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

moral distractors

large language models

situationist moral psychology