🤖 AI Summary
In dynamic social contexts, emotion recognition requires joint modeling of facial dynamics and situational cues. To address this, we propose a Bayesian Cue Integration (BCI)-based saliency-adjustment framework that dynamically weights facial expressions and contextual information for context-adaptive emotion discrimination. Our method leverages vision-language models (VLMs) to extract multimodal representations and incorporates a learnable saliency gating mechanism to optimize cue fusion. Evaluated on real-world social博弈 scenarios—including the Prisoner’s Dilemma—the framework achieves an 8.3% absolute improvement in emotion recognition accuracy over static fusion and unimodal baselines. This work is the first to systematically integrate Bayesian cue integration into dynamic-context emotion recognition, establishing a novel, interpretable, and scalable paradigm for multimodal affective computing in complex social interactions.
📝 Abstract
Emotion recognition in dynamic social contexts requires an understanding of the complex interaction between facial expressions and situational cues. This paper presents a salience-adjusted framework for context-aware emotion recognition with Bayesian Cue Integration (BCI) and Visual-Language Models (VLMs) to dynamically weight facial and contextual information based on the expressivity of facial cues. We evaluate this approach using human annotations and automatic emotion recognition systems in prisoner's dilemma scenarios, which are designed to evoke emotional reactions. Our findings demonstrate that incorporating salience adjustment enhances emotion recognition performance, offering promising directions for future research to extend this framework to broader social contexts and multimodal applications.