Enhancing Facial Expression Recognition through Dual-Direction Attention Mixed Feature Networks and CLIP: Application to 8th ABAW Challenge

📅 2025-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the CVPR 2025 ABAW Challenge, tackling three core facial affect analysis tasks—valence-arousal (VA) estimation, discrete emotion recognition, and facial action unit (AU) detection—within a unified framework. We propose the Dual-Direction Attention Mixed Feature Network (DDAMFN), the first architecture enabling effective cross-task feature reuse while significantly improving performance on all subtasks. To enhance fine-grained affect representation, we integrate the CLIP vision-language model and investigate its transferability to emotion understanding. Furthermore, we design a multi-task feature sharing mechanism coupled with bidirectional attention-based feature fusion to strengthen both representational consistency and discriminability across tasks. Evaluated on the official ABAW 2025 test set, our method achieves substantial improvements over all baselines in all three tasks, demonstrating its effectiveness, robustness, and generalization capability.

Technology Category

Application Category

📝 Abstract
We present our contribution to the 8th ABAW challenge at CVPR 2025, where we tackle valence-arousal estimation, emotion recognition, and facial action unit detection as three independent challenges. Our approach leverages the well-known Dual-Direction Attention Mixed Feature Network (DDAMFN) for all three tasks, achieving results that surpass the proposed baselines. Additionally, we explore the use of CLIP for the emotion recognition challenge as an additional experiment. We provide insights into the architectural choices that contribute to the strong performance of our methods.
Problem

Research questions and friction points this paper is trying to address.

Improves facial expression recognition accuracy
Addresses valence-arousal estimation and emotion recognition
Enhances facial action unit detection performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Direction Attention Mixed Feature Networks
CLIP for emotion recognition enhancement
Architectural insights for superior performance
🔎 Similar Papers
No similar papers found.
J
Josep Cabacas-Maso
eHealth Center, Faculty of Computer Science, Multimedia and Telecommunication, Universitat Oberta de Catalunya, 08018 Barcelona, Spain
E
Elena Ortega-Beltr'an
eHealth Center, Faculty of Computer Science, Multimedia and Telecommunication, Universitat Oberta de Catalunya, 08018 Barcelona, Spain
I
Ismael Benito-Altamirano
eHealth Center, Faculty of Computer Science, Multimedia and Telecommunication, Universitat Oberta de Catalunya, 08018 Barcelona, Spain; MIND/IN2UB, Department of Electronic and Biomedical Engineering, Universitat de Barcelona, 08028 Barcelona, Spain
Carles Ventura
Carles Ventura
Universitat Oberta de Catalunya (UOC)
Computer visionImage and video segmentation