Micro-Expression Recognition via Fine-Grained Dynamic Perception

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Micro-expression recognition (MER) faces dual challenges: difficulty in modeling dynamic spatiotemporal information and severe scarcity of labeled training data. To address these, we propose a fine-grained dynamic-aware framework comprising three key components: (1) a local-global feature-aware Transformer for frame-level representation learning; (2) a ranking-based scoring mechanism to explicitly model fine-grained temporal relationships between appearance and motion dynamics; and (3) joint dynamic image reconstruction to enhance model sensitivity to subtle facial movements and alleviate data scarcity. Temporal pooling enables shared representation learning across recognition and reconstruction tasks. Evaluated on four benchmark datasets—CASME II, SAMM, CAS(ME)², and CAS(ME)³—our method achieves absolute F1-score improvements of 4.05%, 2.50%, 7.71%, and 2.11%, respectively, outperforming state-of-the-art approaches significantly.

Technology Category

Application Category

📝 Abstract
Facial micro-expression recognition (MER) is a challenging task, due to the transience, subtlety, and dynamics of micro-expressions (MEs). Most existing methods resort to hand-crafted features or deep networks, in which the former often additionally requires key frames, and the latter suffers from small-scale and low-diversity training data. In this paper, we develop a novel fine-grained dynamic perception (FDP) framework for MER. We propose to rank frame-level features of a sequence of raw frames in chronological order, in which the rank process encodes the dynamic information of both ME appearances and motions. Specifically, a novel local-global feature-aware transformer is proposed for frame representation learning. A rank scorer is further adopted to calculate rank scores of each frame-level feature. Afterwards, the rank features from rank scorer are pooled in temporal dimension to capture dynamic representation. Finally, the dynamic representation is shared by a MER module and a dynamic image construction module, in which the former predicts the ME category, and the latter uses an encoder-decoder structure to construct the dynamic image. The design of dynamic image construction task is beneficial for capturing facial subtle actions associated with MEs and alleviating the data scarcity issue. Extensive experiments show that our method (i) significantly outperforms the state-of-the-art MER methods, and (ii) works well for dynamic image construction. Particularly, our FDP improves by 4.05%, 2.50%, 7.71%, and 2.11% over the previous best results in terms of F1-score on the CASME II, SAMM, CAS(ME)^2, and CAS(ME)^3 datasets, respectively. The code is available at https://github.com/CYF-cuber/FDP.
Problem

Research questions and friction points this paper is trying to address.

Recognizing subtle facial micro-expressions with limited training data
Capturing dynamic appearance and motion information in micro-expressions
Addressing data scarcity and low-diversity issues in micro-expression recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained dynamic perception framework for MER
Local-global feature-aware transformer for representation
Dynamic image construction to alleviate data scarcity
🔎 Similar Papers
No similar papers found.
Z
Zhiwen Shao
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China, and Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou, China, and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
Y
Yifan Cheng
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China, and Mine Digitization Engineering Research Center of the Ministry of Education, Xuzhou, China, and Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
F
Fan Zhang
Inspur Zhuoshu Big Data Industry Development Co., Ltd., Jinan, China
X
Xuehuai Shi
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China
C
Canlin Li
School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, China
L
Lizhuang Ma
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Dit-Yan Yeung
Dit-Yan Yeung
Chair Professor, Department of CSE, HKUST, Hong Kong
Machine LearningArtificial IntelligenceComputer Vision