V-NAW: Video-based Noise-aware Adaptive Weighting for Facial Expression Recognition

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address performance degradation in wild video-based facial expression recognition caused by label ambiguity and class imbalance, this paper proposes a video-level noise-aware adaptive frame weighting method. It dynamically assigns frame weights via confidence calibration, integrates temporal-aware label smoothing with consecutive-frame difference enhancement to model robust temporal dynamics, and introduces a lightweight redundancy-reduction frame enhancement strategy to mitigate overfitting. The method is embedded within an end-to-end CNN-LSTM hybrid architecture. On the Aff-Wild2 EXPR challenge, it achieves a 3.2% absolute accuracy improvement over the baseline, significantly alleviating the impact of label noise and long-tailed class distribution. The core contributions are: (1) the first video-level noise-aware frame weighting mechanism that explicitly models label uncertainty across frames; and (2) a lightweight, temporally aware enhancement paradigm specifically designed to suppress frame-level redundancy while preserving discriminative temporal patterns.

Technology Category

Application Category

📝 Abstract

Facial Expression Recognition (FER) plays a crucial role in human affective analysis and has been widely applied in computer vision tasks such as human-computer interaction and psychological assessment. The 8th Affective Behavior Analysis in-the-Wild (ABAW) Challenge aims to assess human emotions using the video-based Aff-Wild2 dataset. This challenge includes various tasks, including the video-based EXPR recognition track, which is our primary focus. In this paper, we demonstrate that addressing label ambiguity and class imbalance, which are known to cause performance degradation, can lead to meaningful performance improvements. Specifically, we propose Video-based Noise-aware Adaptive Weighting (V-NAW), which adaptively assigns importance to each frame in a clip to address label ambiguity and effectively capture temporal variations in facial expressions. Furthermore, we introduce a simple and effective augmentation strategy to reduce redundancy between consecutive frames, which is a primary cause of overfitting. Through extensive experiments, we validate the effectiveness of our approach, demonstrating significant improvements in video-based FER performance.

Problem

Research questions and friction points this paper is trying to address.

Address label ambiguity in facial expression recognition.

Tackle class imbalance to improve recognition performance.

Reduce frame redundancy to prevent overfitting in videos.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video-based Noise-aware Adaptive Weighting (V-NAW)

Adaptive frame importance assignment

Augmentation strategy to reduce redundancy

🔎 Similar Papers

No similar papers found.

Authors to Follow