A Lightweight Attention-based Deep Network via Multi-Scale Feature Fusion for Multi-View Facial Expression Recognition

📅 2024-03-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This paper addresses the high computational complexity and poor pose robustness of CNN-based models for multi-view facial expression recognition (FER) in real-world scenarios. To this end, we propose LANMSFF, a lightweight attention network. Methodologically, we introduce MassAtt—a novel dual-path channel-spatial joint attention mechanism—for dynamic recalibration of discriminative features; incorporate PWFS, a point-wise feature selection module, to pre-filter ineffective features; and replace conventional direct multi-scale feature fusion with an adaptive multi-scale feature fusion strategy. Evaluated on KDEF, FER-2013, and FERPlus, LANMSFF achieves accuracies of 90.77%, 70.44%, and 86.96%, respectively—while significantly reducing parameter count and demonstrating strong robustness to head pose variations.

Technology Category

Application Category

📝 Abstract
Convolutional neural networks (CNNs) and their variations have shown effectiveness in facial expression recognition (FER). However, they face challenges when dealing with high computational complexity and multi-view head poses in real-world scenarios. We introduce a lightweight attentional network incorporating multi-scale feature fusion (LANMSFF) to tackle these issues. For the first challenge, we carefully design a lightweight network. We address the second challenge by presenting two novel components, namely mass attention (MassAtt) and point wise feature selection (PWFS) blocks. The MassAtt block simultaneously generates channel and spatial attention maps to recalibrate feature maps by emphasizing important features while suppressing irrelevant ones. In addition, the PWFS block employs a feature selection mechanism that discards less meaningful features prior to the fusion process. This mechanism distinguishes it from previous methods that directly fuse multi-scale features. Our proposed approach achieved results comparable to state-of-the-art methods in terms of parameter count and robustness to pose variation, with accuracy rates of 90.77% on KDEF, 70.44% on FER-2013, and 86.96% on FERPlus datasets. The code for LANMSFF is available at https://github.com/AE-1129/LANMSFF.
Problem

Research questions and friction points this paper is trying to address.

Lightweight network reduces computational complexity.
MassAtt and PWFS blocks handle multi-view poses.
Multi-scale feature fusion improves expression recognition.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight attentional network design
Mass attention for feature recalibration
Point wise feature selection mechanism
🔎 Similar Papers
No similar papers found.