Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant performance degradation of facial expression recognition (FER) under occlusion, this paper proposes ORSANet, a semantic-aware network. Methodologically, it introduces semantic segmentation maps and facial landmarks as dense semantic and sparse geometric priors—first in FER—and designs a multi-scale cross-modal interaction module for disentangled feature learning across modalities, alongside a dynamic adversarial repulsion enhancement loss to strengthen inter-class discriminability. Technically, our contributions are: (1) explicit modeling of semantic discrepancies between occlusion and facial expressions; and (2) robust, multi-granularity prior-guided feature learning. ORSANet achieves state-of-the-art performance on AffectNet, RAF-DB, and a newly constructed occlusion-specific benchmark, Occlu-FER. Notably, it yields substantial accuracy improvements under occlusion, demonstrating both effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Facial expression recognition (FER) is a challenging task due to pervasive occlusion and dataset biases. Especially when facial information is partially occluded, existing FER models struggle to extract effective facial features, leading to inaccurate classifications. In response, we present ORSANet, which introduces the following three key contributions: First, we introduce auxiliary multi-modal semantic guidance to disambiguate facial occlusion and learn high-level semantic knowledge, which is two-fold: 1) we introduce semantic segmentation maps as dense semantics prior to generate semantics-enhanced facial representations; 2) we introduce facial landmarks as sparse geometric prior to mitigate intrinsic noises in FER, such as identity and gender biases. Second, to facilitate the effective incorporation of these two multi-modal priors, we customize a Multi-scale Cross-interaction Module (MCM) to adaptively fuse the landmark feature and semantics-enhanced representations within different scales. Third, we design a Dynamic Adversarial Repulsion Enhancement Loss (DARELoss) that dynamically adjusts the margins of ambiguous classes, further enhancing the model's ability to distinguish similar expressions. We further construct the first occlusion-oriented FER dataset to facilitate specialized robustness analysis on various real-world occlusion conditions, dubbed Occlu-FER. Extensive experiments on both public benchmarks and Occlu-FER demonstrate that our proposed ORSANet achieves SOTA recognition performance. Code is publicly available at https://github.com/Wenyuzhy/ORSANet-master.
Problem

Research questions and friction points this paper is trying to address.

Improves facial expression recognition under occlusion conditions
Integrates multi-modal semantic guidance to reduce biases
Enhances model's ability to distinguish similar expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal semantic guidance for occlusion disambiguation
Multi-scale Cross-interaction Module for feature fusion
Dynamic Adversarial Repulsion Enhancement Loss for classification
🔎 Similar Papers
No similar papers found.
Huiyu Zhai
Huiyu Zhai
UESTC
Computer Vision
X
Xingxing Yang
Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China
Y
Yalan Ye
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
C
Chenyang Li
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
B
Bin Fan
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
Changze Li
Changze Li
University of Electronic Science and Technology of China
Facial Expression Recognition