A Trustworthy Method for Multimodal Emotion Recognition

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal emotion recognition methods primarily focus on improving accuracy while neglecting prediction reliability, rendering them vulnerable to noise, corrupted inputs, and out-of-distribution data. To address this, we propose TER (Trustworthy Emotion Recognition), the first framework to incorporate uncertainty estimation into multimodal emotion recognition. TER employs confidence-weighted fusion of modality-specific outputs and introduces a novel evaluation suite—including trustworthy accuracy, trustworthy recall, and trustworthy F1—to quantify predictive reliability. It jointly optimizes uncertainty quantification, robust multimodal fusion, and adaptive threshold-based decision-making in an end-to-end manner. Evaluated on IEMOCAP and Music-video datasets, TER achieves trustworthy F1 scores of 0.7511 and 0.9035, respectively, alongside an overall accuracy of 82.40%, consistently outperforming state-of-the-art methods. This demonstrates substantial improvements in model reliability and generalization robustness under distributional shifts and input perturbations.

Technology Category

Application Category

📝 Abstract
Existing emotion recognition methods mainly focus on enhancing performance by employing complex deep models, typically resulting in significantly higher model complexity. Although effective, it is also crucial to ensure the reliability of the final decision, especially for noisy, corrupted and out-of-distribution data. To this end, we propose a novel emotion recognition method called trusted emotion recognition (TER), which utilizes uncertainty estimation to calculate the confidence value of predictions. TER combines the results from multiple modalities based on their confidence values to output the trusted predictions. We also provide a new evaluation criterion to assess the reliability of predictions. Specifically, we incorporate trusted precision and trusted recall to determine the trusted threshold and formulate the trusted Acc. and trusted F1 score to evaluate the model's trusted performance. The proposed framework combines the confidence module that accordingly endows the model with reliability and robustness against possible noise or corruption. The extensive experimental results validate the effectiveness of our proposed model. The TER achieves state-of-the-art performance on the Music-video, achieving 82.40% Acc. In terms of trusted performance, TER outperforms other methods on the IEMOCAP and Music-video, achieving trusted F1 scores of 0.7511 and 0.9035, respectively.
Problem

Research questions and friction points this paper is trying to address.

Ensuring reliable emotion recognition with noisy data
Reducing model complexity in multimodal emotion recognition
Evaluating prediction reliability using trusted metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty estimation for prediction confidence
Multimodal fusion based on confidence values
New trusted metrics for reliability evaluation
🔎 Similar Papers
No similar papers found.
Junxiao Xue
Junxiao Xue
Zhejiang Lab
Computer GraphicsCrowd simulationMulti-agents ModelingMulti-modal Learning
Xiaozhen Liu
Xiaozhen Liu
Zhengzhou University
Computer VisionMultimodal Learning
J
Jie Wang
China Mobile (Hangzhou) Information Technology Co. Ltd., Hangzhou, 311100, China
X
Xuecheng Wu
School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi, 710049, China
B
Bin Wu
School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, 450001, China