🤖 AI Summary
To address the challenges of discriminative feature extraction in low-resolution facial expression recognition—stemming from missing fine-grained details and insufficient global modeling capability—this paper proposes the Global Multi-Scale Feature Extraction Network (GMFEN). GMFEN comprises a Hybrid Attention-based Local Feature Module to enhance detail perception and a Class-Symmetric Multi-Scale Global Module to strengthen structural modeling. Additionally, an attention-guided similarity-based knowledge distillation strategy is introduced to jointly facilitate detail recovery and noise suppression. By integrating attention mechanisms, multi-scale representation learning, knowledge distillation, and class-symmetric architectural design, GMFEN achieves state-of-the-art performance on benchmark datasets including FER2013 and RAF-DB, outperforming existing methods by 2.3–4.7% in average recognition accuracy. The proposed approach significantly improves robustness and generalization for facial expression recognition under low-resolution conditions.
📝 Abstract
Facial expression recognition, as a vital computer vision task, is garnering significant attention and undergoing extensive research. Although facial expression recognition algorithms demonstrate impressive performance on high-resolution images, their effectiveness tends to degrade when confronted with low-resolution images. We find it is because: 1) low-resolution images lack detail information; 2) current methods complete weak global modeling, which make it difficult to extract discriminative features. To alleviate the above issues, we proposed a novel global multiple extraction network (GME-Net) for low-resolution facial expression recognition, which incorporates 1) a hybrid attention-based local feature extraction module with attention similarity knowledge distillation to learn image details from high-resolution network; 2) a multi-scale global feature extraction module with quasi-symmetric structure to mitigate the influence of local image noise and facilitate capturing global image features. As a result, our GME-Net is capable of extracting expression-related discriminative features. Extensive experiments conducted on several widely-used datasets demonstrate that the proposed GME-Net can better recognize low-resolution facial expression and obtain superior performance than existing solutions.