Enhancing Ambiguous Dynamic Facial Expression Recognition with Soft Label-based Data Augmentation

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of accurately recognizing ambiguous and noisy facial expressions in real-world dynamic facial expression recognition (DFER), this paper proposes MIDAS, a soft-label data augmentation method. MIDAS extends MixUp to video sequences with soft labels by performing convex combinations of adjacent frames and their corresponding multi-class emotion probability distributions, thereby explicitly modeling expression ambiguity. Crucially, MIDAS requires no additional annotations and introduces only lightweight augmentation during training, significantly enhancing model robustness to boundary-ambiguous and low-confidence expressions. Extensive experiments on the DFEW benchmark and the newly constructed large-scale dataset FERV39k-Plus demonstrate that models trained with MIDAS consistently outperform existing state-of-the-art methods. These results validate the effectiveness and generalizability of soft-label video augmentation for DFER, establishing MIDAS as a principled and practical solution for improving recognition under realistic, imperfect conditions.

Technology Category

Application Category

📝 Abstract
Dynamic facial expression recognition (DFER) is a task that estimates emotions from facial expression video sequences. For practical applications, accurately recognizing ambiguous facial expressions -- frequently encountered in in-the-wild data -- is essential. In this study, we propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data using soft labels representing probabilities of multiple emotion classes. MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels. This approach extends mixup to soft-labeled video data, offering a simple yet highly effective method for handling ambiguity in DFER. To evaluate MIDAS, we conducted experiments on both the DFEW dataset and FERV39k-Plus, a newly constructed dataset that assigns soft labels to an existing DFER dataset. The results demonstrate that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.
Problem

Research questions and friction points this paper is trying to address.

Improving recognition of ambiguous facial expressions in videos
Enhancing DFER with soft label-based data augmentation
Handling emotion ambiguity in dynamic facial expression datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft label-based data augmentation for DFER
Convex combination of video frames and labels
Extends mixup to soft-labeled video data
🔎 Similar Papers
No similar papers found.