Visual Distraction Undermines Moral Reasoning in Vision-Language Models

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the vulnerability of vision-language models (VLMs) in moral reasoning when exposed to visual inputs, which can undermine safety alignment mechanisms designed for text-only settings. To systematically investigate this issue, the authors introduce the Multimodal Dilemma Simulation (MDS), the first benchmark grounded in Moral Foundations Theory that enables orthogonal manipulation of visual and contextual variables in moral dilemmas. Through controlled experiments on prominent VLMs, the research demonstrates that visual stimuli significantly activate intuitive judgment pathways while suppressing deliberative reasoning, thereby distorting moral decisions. This work provides the first empirical evidence of the fragility of current safety alignment strategies in multimodal contexts and establishes both a foundational evaluation framework and empirical basis for developing robust multimodal ethical alignment methods.

Technology Category

Application Category

📝 Abstract
Moral reasoning is fundamental to safe Artificial Intelligence (AI), yet ensuring its consistency across modalities becomes critical as AI systems evolve from text-based assistants to embodied agents. Current safety techniques demonstrate success in textual contexts, but concerns remain about generalization to visual inputs. Existing moral evaluation benchmarks rely on textonly formats and lack systematic control over variables that influence moral decision-making. Here we show that visual inputs fundamentally alter moral decision-making in state-of-the-art (SOTA) Vision-Language Models (VLMs), bypassing text-based safety mechanisms. We introduce Moral Dilemma Simulation (MDS), a multimodal benchmark grounded in Moral Foundation Theory (MFT) that enables mechanistic analysis through orthogonal manipulation of visual and contextual variables. The evaluation reveals that the vision modality activates intuition-like pathways that override the more deliberate and safer reasoning patterns observed in text-only contexts. These findings expose critical fragilities where language-tuned safety filters fail to constrain visual processing, demonstrating the urgent need for multimodal safety alignment.
Problem

Research questions and friction points this paper is trying to address.

moral reasoning
visual distraction
vision-language models
multimodal safety
Moral Foundation Theory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models
Moral Reasoning
Multimodal Safety
Moral Dilemma Simulation
Visual Distraction
🔎 Similar Papers
No similar papers found.
X
Xinyi Yang
Institute for Artificial Intelligence, Peking University; School of Psychological and Cognitive Sciences, Peking University; State Key Lab of General Artificial Intelligence, Peking University; Beijing Key Laboratory of Behavior and Mental Health, Peking University
C
Chenheng Xu
Institute for Artificial Intelligence, Peking University; School of Psychological and Cognitive Sciences, Peking University; State Key Lab of General Artificial Intelligence, Peking University; Beijing Key Laboratory of Behavior and Mental Health, Peking University
Weijun Hong
Weijun Hong
NetEase Games AI Lab
Reinforcement LearningMachine Learning
C
Ce Mo
Department of Psychology, Sun Yat-sen University
Qian Wang
Qian Wang
Peking University
Computer Vision
Fang Fang
Fang Fang
Professor, School of Psychological and Cognitive Sciences, Peking University
Visual PerceptionAttentionConsciousnessNeuroimaging
Yixin Zhu
Yixin Zhu
Assistant Professor, Peking University
Computer VisionVisual ReasoningHuman-Robot Teaming