ME-TST+: Micro-expression Analysis via Temporal State Transition with ROI Relationship Awareness

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Micro-expression analysis faces two key challenges: (1) fixed-window sliding classification fails to accommodate micro-expressions’ transient nature and variable duration; and (2) spotting and recognition are conventionally modeled separately, neglecting their intrinsic coupling. To address these, we propose ME-TST+, the first video-level end-to-end regression framework that jointly models spotting and recognition via a temporal state transition mechanism. Methodologically, it integrates three innovations: dynamic temporal modeling based on state-space models, multi-granularity region-of-interest (ROI) feature extraction, and a dual-path (slow-fast) Mamba architecture, with collaborative optimization at both feature and decision levels. Evaluated on benchmark datasets—including CASME III and SAMM—ME-TST+ achieves state-of-the-art performance, significantly improving spotting precision, recognition accuracy, and robustness across diverse micro-expression durations.

Technology Category

Application Category

📝 Abstract
Micro-expressions (MEs) are regarded as important indicators of an individual's intrinsic emotions, preferences, and tendencies. ME analysis requires spotting of ME intervals within long video sequences and recognition of their corresponding emotional categories. Previous deep learning approaches commonly employ sliding-window classification networks. However, the use of fixed window lengths and hard classification presents notable limitations in practice. Furthermore, these methods typically treat ME spotting and recognition as two separate tasks, overlooking the essential relationship between them. To address these challenges, this paper proposes two state space model-based architectures, namely ME-TST and ME-TST+, which utilize temporal state transition mechanisms to replace conventional window-level classification with video-level regression. This enables a more precise characterization of the temporal dynamics of MEs and supports the modeling of MEs with varying durations. In ME-TST+, we further introduce multi-granularity ROI modeling and the slowfast Mamba framework to alleviate information loss associated with treating ME analysis as a time-series task. Additionally, we propose a synergy strategy for spotting and recognition at both the feature and result levels, leveraging their intrinsic connection to enhance overall analysis performance. Extensive experiments demonstrate that the proposed methods achieve state-of-the-art performance. The codes are available at https://github.com/zizheng-guo/ME-TST.
Problem

Research questions and friction points this paper is trying to address.

Improving micro-expression analysis via temporal state transition models
Addressing limitations of fixed window lengths in ME classification
Integrating ME spotting and recognition for enhanced performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

State space model for temporal dynamics
Multi-granularity ROI and slowfast Mamba
Synergy strategy for spotting and recognition
🔎 Similar Papers
No similar papers found.
Z
Zizheng Guo
School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083, China
B
Bochao Zou
School of Computer and Communication Engineering, University of Science and Technology Beijing, 100083, China
Junbao Zhuo
Junbao Zhuo
University of Science and Technology Beijing
Domain adaptationtransfer learningdeep learningcross-modal retreival
Huimin Ma
Huimin Ma
清华大学 电子工程系 副教授