Visual Affect Analysis: Predicting Emotions of Image Viewers with Vision-Language Models

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of vision-language models (VLMs) to predict viewers’ emotional responses to images in a zero-shot setting and evaluates their alignment with human psychometric ratings. We systematically assess nine open- and closed-source VLMs across three standardized affective image datasets on tasks involving discrete emotion classification (6- and 12-class) and continuous dimensional rating (valence and arousal), introducing viewer-conditioned prompts for the first time to explore personalized modeling. Results show that VLMs achieve 60%–80% accuracy in six-class emotion classification and exhibit strong correlations (r > 0.75) with human ratings on valence—though weaker on arousal—with a consistent tendency to overestimate emotional intensity. The impact of conditioning prompts remains limited. This work presents the first systematic evaluation of VLMs’ affective alignment across multiple datasets, models, and fine-grained emotional tasks.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) show promise as tools for inferring affect from visual stimuli at scale; it is not yet clear how closely their outputs align with human affective ratings. We benchmarked nine VLMs, ranging from state-of-the-art proprietary models to open-source models, on three psycho-metrically validated affective image datasets: the International Affective Picture System, the Nencki Affective Picture System, and the Library of AI-Generated Affective Images. The models performed two tasks in the zero-shot setting: (i) top-emotion classification (selecting the strongest discrete emotion elicited by an image) and (ii) continuous prediction of human ratings on 1-7/9 Likert scales for discrete emotion categories and affective dimensions. We also evaluated the impact of rater-conditioned prompting on the LAI-GAI dataset using de-identified participant metadata. The results show good performance in discrete emotion classification, with accuracies typically ranging from 60% to 80% on six-emotion labels and from 60% to 75% on a more challenging 12-category task. The predictions of anger and surprise had the lowest accuracy in all datasets. For continuous rating prediction, models showed moderate to strong alignment with humans (r>0.75) but also exhibited consistent biases, notably weaker performance on arousal, and a tendency to overestimate response strength. Rater-conditioned prompting resulted in only small, inconsistent changes in predictions. Overall, VLMs capture broad affective trends but lack the nuance found in validated psychological ratings, highlighting their potential and current limitations for affective computing and mental health-related applications.
Problem

Research questions and friction points this paper is trying to address.

Visual Affect Analysis
Vision-Language Models
Emotion Prediction
Affective Computing
Human Affective Ratings
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models
affective computing
zero-shot emotion prediction
rater-conditioned prompting
psychometric benchmarking
🔎 Similar Papers
2024-05-14IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 2
F
Filip Nowicki
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
H
Hubert Marciniak
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
J
Jakub Łączkowski
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
Krzysztof Jassem
Krzysztof Jassem
Adam Mickiewicz University
Przetwarzanie Języka Naturalnego
T
Tomasz Górecki
Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznań, Poland
V
Vimala Balakrishnan
Faculty of Computer Science and Information Technology, Universiti Malaya, Malaysia; Department of Computer Science and Engineering, Korea University, Korea
Desmond C. Ong
Desmond C. Ong
Assistant Professor of Psychology, The University of Texas at Austin
Affective CognitionEmotionsEmpathyAffective Computing
M
Maciej Behnke
Cognitive Neuroscience Center, Adam Mickiewicz University, Poznań, Poland