🤖 AI Summary
Composite Expression (CE) recognition faces dual challenges: scarcity of labeled data and ambiguous emotional boundaries. To address these, we propose a three-stage progressive curriculum learning framework specifically designed for CE: (1) pretraining the model on single-expression data; (2) dynamically generating high-quality synthetic CE samples in an unsupervised manner via CutMix/Mixup; and (3) incrementally fusing multi-expression data to enhance generalization. This work introduces the first curriculum learning paradigm tailored to composite expressions and proposes the inaugural image-mixing–based unsupervised CE generation strategy, effectively alleviating the data bottleneck. Evaluated on the 7th ABAW Competition’s Composite Expression Challenge, our method achieves a state-of-the-art F1-score of 0.6063, securing first place.
📝 Abstract
With the advent of deep learning, expression recognition has made significant advancements. However, due to the limited availability of annotated compound expression datasets and the subtle variations of compound expressions, Compound Emotion Recognition (CE) still holds considerable potential for exploration. To advance this task, the 7th Affective Behavior Analysis in-the-wild (ABAW) competition introduces the Compound Expression Challenge based on C-EXPR-DB, a limited dataset without labels. In this paper, we present a curriculum learning-based framework that initially trains the model on single-expression tasks and subsequently incorporates multi-expression data. This design ensures that our model first masters the fundamental features of basic expressions before being exposed to the complexities of compound emotions. Specifically, our designs can be summarized as follows: 1) Single-Expression Pre-training: The model is first trained on datasets containing single expressions to learn the foundational facial features associated with basic emotions. 2) Dynamic Compound Expression Generation: Given the scarcity of annotated compound expression datasets, we employ CutMix and Mixup techniques on the original single-expression images to create hybrid images exhibiting characteristics of multiple basic emotions. 3) Incremental Multi-Expression Integration: After performing well on single-expression tasks, the model is progressively exposed to multi-expression data, allowing the model to adapt to the complexity and variability of compound expressions. The official results indicate that our method achieves the extbf{best} performance in this competition track with an F-score of 0.6063. Our code is released at https://github.com/YenanLiu/ABAW7th.