🤖 AI Summary
This paper addresses the CVPR 2025 ABAW Challenge, tackling three core facial affect analysis tasks—valence-arousal (VA) estimation, discrete emotion recognition, and facial action unit (AU) detection—within a unified framework. We propose the Dual-Direction Attention Mixed Feature Network (DDAMFN), the first architecture enabling effective cross-task feature reuse while significantly improving performance on all subtasks. To enhance fine-grained affect representation, we integrate the CLIP vision-language model and investigate its transferability to emotion understanding. Furthermore, we design a multi-task feature sharing mechanism coupled with bidirectional attention-based feature fusion to strengthen both representational consistency and discriminability across tasks. Evaluated on the official ABAW 2025 test set, our method achieves substantial improvements over all baselines in all three tasks, demonstrating its effectiveness, robustness, and generalization capability.
📝 Abstract
We present our contribution to the 8th ABAW challenge at CVPR 2025, where we tackle valence-arousal estimation, emotion recognition, and facial action unit detection as three independent challenges. Our approach leverages the well-known Dual-Direction Attention Mixed Feature Network (DDAMFN) for all three tasks, achieving results that surpass the proposed baselines. Additionally, we explore the use of CLIP for the emotion recognition challenge as an additional experiment. We provide insights into the architectural choices that contribute to the strong performance of our methods.