Artificial Intelligence Can Emulate Human Normative Judgments on Emotional Visual Scenes

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study investigates whether multimodal large language models (MLLMs) can unsupervisedly recapitulate human affective perception of images—specifically valence, arousal, and basic discrete emotions. Using standardized affective image datasets (e.g., IAPS) and state-of-the-art vision–language pre-trained models, we employ contrastive image–text learning without any explicit affective supervision, thereby implicitly modeling affective concepts through cross-modal statistical regularities. Results demonstrate strong alignment (r > 0.8) between model-predicted affective dimensions and human population-level ground-truth ratings—providing the first empirical evidence that affective perception can spontaneously emerge from unsupervised multimodal representation learning. Crucially, ablation analyses reveal that the language modality plays a pivotal role in affective concept formation, uncovering a structural guidance mechanism whereby textual priors shape visual affect understanding. These findings offer novel, cognitively grounded evidence for the interpretability and psychological plausibility of AI-based affective intelligence.

Technology Category

Application Category

📝 Abstract

Affective reactions have deep biological foundations, however in humans the development of emotion concepts is also shaped by language and higher-order cognition. A recent breakthrough in AI has been the creation of multimodal language models that exhibit impressive intellectual capabilities, but their responses to affective stimuli have not been investigated. Here we study whether state-of-the-art multimodal systems can emulate human emotional ratings on a standardized set of images, in terms of affective dimensions and basic discrete emotions. The AI judgements correlate surprisingly well with the average human ratings: given that these systems were not explicitly trained to match human affective reactions, this suggests that the ability to visually judge emotional content can emerge from statistical learning over large-scale databases of images paired with linguistic descriptions. Besides showing that language can support the development of rich emotion concepts in AI, these findings have broad implications for sensitive use of multimodal AI technology.

Problem

Research questions and friction points this paper is trying to address.

Investigates if AI can mimic human emotional ratings on images

Explores affective dimensions and basic discrete emotions in AI

Examines emergence of emotional judgment from statistical learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal AI models emulate human emotional ratings

Statistical learning enables visual emotion judgment

Language supports AI emotion concept development

🔎 Similar Papers

EmoEdit: Evoking Emotions through Image Manipulation