🤖 AI Summary
This work addresses the insufficient moral alignment of vision-language models (VLMs) in high-stakes domains such as autonomous driving and healthcare. To this end, we introduce MORALISE—the first benchmark for evaluating moral alignment in VLMs. Grounded in Turiel’s domain theory, MORALISE features a fine-grained, three-tiered (personal, interpersonal, societal) taxonomy comprising 13 moral categories. It comprises 2,481 expert-validated, real-world image–text pairs, supporting dual evaluation tasks: moral judgment and moral norm attribution. Its key contributions include: (1) the first use of human-curated, real-world multimodal data—avoiding distributional shift induced by AI-generated images; (2) the first multimodal violation attribution annotations; and (3) a systematic evaluation of 19 state-of-the-art VLMs, revealing a substantial performance gap between model and human moral reasoning. The benchmark is publicly released to advance standardization in multimodal ethical evaluation.
📝 Abstract
Warning: This paper contains examples of harmful language and images. Reader discretion is advised. Recently, vision-language models have demonstrated increasing influence in morally sensitive domains such as autonomous driving and medical analysis, owing to their powerful multimodal reasoning capabilities. As these models are deployed in high-stakes real-world applications, it is of paramount importance to ensure that their outputs align with human moral values and remain within moral boundaries. However, existing work on moral alignment either focuses solely on textual modalities or relies heavily on AI-generated images, leading to distributional biases and reduced realism. To overcome these limitations, we introduce MORALISE, a comprehensive benchmark for evaluating the moral alignment of vision-language models (VLMs) using diverse, expert-verified real-world data. We begin by proposing a comprehensive taxonomy of 13 moral topics grounded in Turiel's Domain Theory, spanning the personal, interpersonal, and societal moral domains encountered in everyday life. Built on this framework, we manually curate 2,481 high-quality image-text pairs, each annotated with two fine-grained labels: (1) topic annotation, identifying the violated moral topic(s), and (2) modality annotation, indicating whether the violation arises from the image or the text. For evaluation, we encompass two tasks, extit{moral judgment} and extit{moral norm attribution}, to assess models' awareness of moral violations and their reasoning ability on morally salient content. Extensive experiments on 19 popular open- and closed-source VLMs show that MORALISE poses a significant challenge, revealing persistent moral limitations in current state-of-the-art models. The full benchmark is publicly available at https://huggingface.co/datasets/Ze1025/MORALISE.