Micro-Expression Recognition via Fine-Grained Dynamic Perception

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Micro-expression recognition (MER) faces dual challenges: difficulty in modeling dynamic spatiotemporal information and severe scarcity of labeled training data. To address these, we propose a fine-grained dynamic-aware framework comprising three key components: (1) a local-global feature-aware Transformer for frame-level representation learning; (2) a ranking-based scoring mechanism to explicitly model fine-grained temporal relationships between appearance and motion dynamics; and (3) joint dynamic image reconstruction to enhance model sensitivity to subtle facial movements and alleviate data scarcity. Temporal pooling enables shared representation learning across recognition and reconstruction tasks. Evaluated on four benchmark datasets—CASME II, SAMM, CAS(ME)², and CAS(ME)³—our method achieves absolute F1-score improvements of 4.05%, 2.50%, 7.71%, and 2.11%, respectively, outperforming state-of-the-art approaches significantly.

Technology Category

Application Category

📝 Abstract

Facial micro-expression recognition (MER) is a challenging task, due to the transience, subtlety, and dynamics of micro-expressions (MEs). Most existing methods resort to hand-crafted features or deep networks, in which the former often additionally requires key frames, and the latter suffers from small-scale and low-diversity training data. In this paper, we develop a novel fine-grained dynamic perception (FDP) framework for MER. We propose to rank frame-level features of a sequence of raw frames in chronological order, in which the rank process encodes the dynamic information of both ME appearances and motions. Specifically, a novel local-global feature-aware transformer is proposed for frame representation learning. A rank scorer is further adopted to calculate rank scores of each frame-level feature. Afterwards, the rank features from rank scorer are pooled in temporal dimension to capture dynamic representation. Finally, the dynamic representation is shared by a MER module and a dynamic image construction module, in which the former predicts the ME category, and the latter uses an encoder-decoder structure to construct the dynamic image. The design of dynamic image construction task is beneficial for capturing facial subtle actions associated with MEs and alleviating the data scarcity issue. Extensive experiments show that our method (i) significantly outperforms the state-of-the-art MER methods, and (ii) works well for dynamic image construction. Particularly, our FDP improves by 4.05%, 2.50%, 7.71%, and 2.11% over the previous best results in terms of F1-score on the CASME II, SAMM, CAS(ME)^2, and CAS(ME)^3 datasets, respectively. The code is available at https://github.com/CYF-cuber/FDP.

Problem

Research questions and friction points this paper is trying to address.

Recognizing subtle facial micro-expressions with limited training data

Capturing dynamic appearance and motion information in micro-expressions

Addressing data scarcity and low-diversity issues in micro-expression recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained dynamic perception framework for MER

Local-global feature-aware transformer for representation

Dynamic image construction to alleviate data scarcity

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA

[2026] Senior Machine Learning Engineer, Account Identity - PhD Early Career

Roblox

Annual Salary Range$195,780—$242,100 USD

San Mateo, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)