ARPGNet: Appearance- and Relation-aware Parallel Graph Attention Fusion Network for Facial Expression Recognition

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing facial expression recognition methods predominantly rely on CNNs to extract static appearance features, neglecting dynamic relational modeling among facial regions. To address this, we propose the Appearance and Relation-aware Graph Fusion Network (ARGF-Net), a parallel graph attention-based architecture. First, frame-level appearance features are extracted using a pretrained CNN. Second, a facial region relation graph is constructed, and graph attention mechanisms are employed to jointly model spatial topology and temporal evolution. Third, a parallel fusion module enables complementary interaction between appearance and relation sequences, facilitating effective spatiotemporal dynamic modeling. Extensive experiments demonstrate that ARGF-Net achieves state-of-the-art or competitive performance on three benchmark datasets—RAF-DB, FER2013, and AffectNet—validating the efficacy of jointly modeling appearance representations and inter-region relational dynamics for enhanced spatiotemporal discriminability.

Technology Category

Application Category

📝 Abstract
The key to facial expression recognition is to learn discriminative spatial-temporal representations that embed facial expression dynamics. Previous studies predominantly rely on pre-trained Convolutional Neural Networks (CNNs) to learn facial appearance representations, overlooking the relationships between facial regions. To address this issue, this paper presents an Appearance- and Relation-aware Parallel Graph attention fusion Network (ARPGNet) to learn mutually enhanced spatial-temporal representations of appearance and relation information. Specifically, we construct a facial region relation graph and leverage the graph attention mechanism to model the relationships between facial regions. The resulting relational representation sequences, along with CNN-based appearance representation sequences, are then fed into a parallel graph attention fusion module for mutual interaction and enhancement. This module simultaneously explores the complementarity between different representation sequences and the temporal dynamics within each sequence. Experimental results on three facial expression recognition datasets demonstrate that the proposed ARPGNet outperforms or is comparable to state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Models relationships between facial regions for expression recognition
Fuses appearance and relational information via parallel graph attention
Enhances spatial-temporal representations to improve recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel graph attention fusion for appearance and relation
Graph attention mechanism models facial region relationships
Mutual interaction enhances spatial-temporal representation sequences
🔎 Similar Papers
No similar papers found.
Y
Yan Li
School of Computer Science, Northwestern Polytechnical University, Beilin District, Xi’an, Shaanxi Province, 710072, China. Y. Li and D. Jiang are also with the Pengcheng Laboratory, Nanshan District, Shenzhen, Guangdong Province, 518055, China.
Y
Yong Zhao
Zhejiang Lab, Yuhang District, Hangzhou, Zhejiang Province, 311100, China.
X
Xiaohan Xia
School of Computer Science, Northwestern Polytechnical University, Beilin District, Xi’an, Shaanxi Province, 710072, China. Y. Li and D. Jiang are also with the Pengcheng Laboratory, Nanshan District, Shenzhen, Guangdong Province, 518055, China.
Dongmei Jiang
Dongmei Jiang
Northwestern Polytechnical University; Peng Cheng Laboratory
Affective ComputingMultimodal emotion recognitionMultimodal mental state evaluation