🤖 AI Summary
To address the challenge of accurately recognizing ambiguous and noisy facial expressions in real-world dynamic facial expression recognition (DFER), this paper proposes MIDAS, a soft-label data augmentation method. MIDAS extends MixUp to video sequences with soft labels by performing convex combinations of adjacent frames and their corresponding multi-class emotion probability distributions, thereby explicitly modeling expression ambiguity. Crucially, MIDAS requires no additional annotations and introduces only lightweight augmentation during training, significantly enhancing model robustness to boundary-ambiguous and low-confidence expressions. Extensive experiments on the DFEW benchmark and the newly constructed large-scale dataset FERV39k-Plus demonstrate that models trained with MIDAS consistently outperform existing state-of-the-art methods. These results validate the effectiveness and generalizability of soft-label video augmentation for DFER, establishing MIDAS as a principled and practical solution for improving recognition under realistic, imperfect conditions.
📝 Abstract
Dynamic facial expression recognition (DFER) is a task that estimates emotions from facial expression video sequences. For practical applications, accurately recognizing ambiguous facial expressions -- frequently encountered in in-the-wild data -- is essential. In this study, we propose MIDAS, a data augmentation method designed to enhance DFER performance for ambiguous facial expression data using soft labels representing probabilities of multiple emotion classes. MIDAS augments training data by convexly combining pairs of video frames and their corresponding emotion class labels. This approach extends mixup to soft-labeled video data, offering a simple yet highly effective method for handling ambiguity in DFER. To evaluate MIDAS, we conducted experiments on both the DFEW dataset and FERV39k-Plus, a newly constructed dataset that assigns soft labels to an existing DFER dataset. The results demonstrate that models trained with MIDAS-augmented data achieve superior performance compared to the state-of-the-art method trained on the original dataset.