MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient moral alignment of vision-language models (VLMs) in high-stakes domains such as autonomous driving and healthcare. To this end, we introduce MORALISE—the first benchmark for evaluating moral alignment in VLMs. Grounded in Turiel’s domain theory, MORALISE features a fine-grained, three-tiered (personal, interpersonal, societal) taxonomy comprising 13 moral categories. It comprises 2,481 expert-validated, real-world image–text pairs, supporting dual evaluation tasks: moral judgment and moral norm attribution. Its key contributions include: (1) the first use of human-curated, real-world multimodal data—avoiding distributional shift induced by AI-generated images; (2) the first multimodal violation attribution annotations; and (3) a systematic evaluation of 19 state-of-the-art VLMs, revealing a substantial performance gap between model and human moral reasoning. The benchmark is publicly released to advance standardization in multimodal ethical evaluation.

Technology Category

Application Category

📝 Abstract
Warning: This paper contains examples of harmful language and images. Reader discretion is advised. Recently, vision-language models have demonstrated increasing influence in morally sensitive domains such as autonomous driving and medical analysis, owing to their powerful multimodal reasoning capabilities. As these models are deployed in high-stakes real-world applications, it is of paramount importance to ensure that their outputs align with human moral values and remain within moral boundaries. However, existing work on moral alignment either focuses solely on textual modalities or relies heavily on AI-generated images, leading to distributional biases and reduced realism. To overcome these limitations, we introduce MORALISE, a comprehensive benchmark for evaluating the moral alignment of vision-language models (VLMs) using diverse, expert-verified real-world data. We begin by proposing a comprehensive taxonomy of 13 moral topics grounded in Turiel's Domain Theory, spanning the personal, interpersonal, and societal moral domains encountered in everyday life. Built on this framework, we manually curate 2,481 high-quality image-text pairs, each annotated with two fine-grained labels: (1) topic annotation, identifying the violated moral topic(s), and (2) modality annotation, indicating whether the violation arises from the image or the text. For evaluation, we encompass two tasks, extit{moral judgment} and extit{moral norm attribution}, to assess models' awareness of moral violations and their reasoning ability on morally salient content. Extensive experiments on 19 popular open- and closed-source VLMs show that MORALISE poses a significant challenge, revealing persistent moral limitations in current state-of-the-art models. The full benchmark is publicly available at https://huggingface.co/datasets/Ze1025/MORALISE.
Problem

Research questions and friction points this paper is trying to address.

Evaluating moral alignment in vision-language models using real-world data
Addressing biases in existing moral alignment benchmarks for VLMs
Assessing models' awareness and reasoning on moral violations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-verified real-world data for moral alignment
Comprehensive taxonomy of 13 moral topics
Dual annotation for moral violations in images and texts
X
Xiao Lin
University of Illinois Urbana-Champaign
Z
Zhining Liu
University of Illinois Urbana-Champaign
Z
Ze Yang
University of Illinois Urbana-Champaign
G
Gaotang Li
University of Illinois Urbana-Champaign
Ruizhong Qiu
Ruizhong Qiu
University of Illinois Urbana-Champaign
Large Language ModelsOptimizationGraph Neural Networks
S
Shuke Wang
University of Illinois Urbana-Champaign
H
Hui Liu
Amazon
H
Haotian Li
University of Illinois Urbana-Champaign
S
Sumit Keswani
Fidelity Investments
V
Vishwa Pardeshi
Fidelity Investments
Huijun Zhao
Huijun Zhao
Griffith University
Functional materialsCatalysisSensing Technology
W
Wei Fan
Fidelity Investments
Hanghang Tong
Hanghang Tong
University of Illinois at Urbana-Champaign
Large Scale Data MiningGraph MiningSocial NetworksHealthcareMultimedia