Reading Smiles: Proxy Bias in Foundation Models for Facial Emotion Recognition

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the visual cues underlying foundation models’ (FMs) facial emotion recognition (FER) and their psychological validity, focusing on proxy bias-induced shortcut learning and fairness risks. Using a teeth-annotated AffectNet subset, we conduct zero-shot FER and structured attribution analysis on multi-scale vision-language models (VLMs). We systematically reveal that tooth visibility serves as a strong proxy cue that substantially biases model predictions. Although models like GPT-4o exhibit internally consistent valence-arousal response patterns, they rely heavily on superficial facial attributes—such as eyebrow position—rather than deeper psychological representations. These findings expose latent bias mechanisms in current FMs when deployed in sensitive domains (e.g., mental health, education), challenging their reliability and equity. Our study provides critical empirical evidence for interpretable FER and fair AI design, highlighting the need to mitigate spurious correlations in multimodal affective modeling.

Technology Category

Application Category

📝 Abstract
Foundation Models (FMs) are rapidly transforming Affective Computing (AC), with Vision Language Models (VLMs) now capable of recognising emotions in zero shot settings. This paper probes a critical but underexplored question: what visual cues do these models rely on to infer affect, and are these cues psychologically grounded or superficially learnt? We benchmark varying scale VLMs on a teeth annotated subset of AffectNet dataset and find consistent performance shifts depending on the presence of visible teeth. Through structured introspection of, the best-performing model, i.e., GPT-4o, we show that facial attributes like eyebrow position drive much of its affective reasoning, revealing a high degree of internal consistency in its valence-arousal predictions. These patterns highlight the emergent nature of FMs behaviour, but also reveal risks: shortcut learning, bias, and fairness issues especially in sensitive domains like mental health and education.
Problem

Research questions and friction points this paper is trying to address.

Investigates visual cues VLMs use for emotion recognition
Examines psychological grounding versus superficial learning in FMs
Reveals shortcut learning and bias risks in affective computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking VLMs on teeth-annotated AffectNet dataset
Introspecting GPT-4o for facial attribute-driven reasoning
Analyzing shortcut learning and bias risks
🔎 Similar Papers
No similar papers found.
Iosif Tsangko
Iosif Tsangko
PhD Student, Technische Universität München
Machine LearningDeep LearningSignal ProcessingNatural Language Processing
Andreas Triantafyllopoulos
Andreas Triantafyllopoulos
Technical University of Munich
machine learningaffective computingcomputer audition
A
Adem Abdelmoula
CHI – Chair of Health Informatics, MRI, Technical University of Munich, Germany
Adria Mallol-Ragolta
Adria Mallol-Ragolta
Researcher, Chair of Health Informatics, Technical University of Munich
AIDeep LearningDigital HealthmHealthAffective Computing
B
Björn W. Schuller
CHI – Chair of Health Informatics, MRI, Technical University of Munich, Germany; MCML – Munich Center for Machine Learning, Munich, Germany; GLAM – Group on Language, Audio, & Music, Imperial College London, UK; MDSI – Munich Data Science Institute, Munich, Germany