MERba: Multi-Receptive Field MambaVision for Micro-Expression Recognition

๐Ÿ“… 2025-06-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Micro-expression recognition faces the challenge of jointly modeling localized muscle movements and global facial dependencies. To address this, we propose a multi-receptive-field hierarchical architecture featuring a novel Local-Global Feature Integration stage that enables progressive modelingโ€”from fine-grained action perception to holistic emotion understanding. We design an asymmetric multi-scan strategy to enhance spatial awareness and introduce a dual-granularity classification module to decouple the high similarity among negative classes. Our method integrates MambaVision Mixer, lightweight self-attention, non-overlapping window-based local feature extraction, and multi-scale spatiotemporal scanning. Evaluated on the benchmark CASME II and SAMM datasets, it achieves state-of-the-art performance. Ablation studies comprehensively validate the effectiveness and necessity of each component.

Technology Category

Application Category

๐Ÿ“ Abstract
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, holding significant potential in psychological diagnosis and criminal investigations. Despite notable advances in automatic ME recognition (MER), existing methods still struggle to jointly capture localized muscle activations and global facial dependencies, both critical for recognizing subtle emotional cues. To tackle this challenge, we propose MERba, a novel multi-receptive field architecture tailored for MER. MERba introduces a series of Local-Global Feature Integration stages, where fine-grained motion features are first extracted by local extractors containing MambaVision Mixers within non-overlapping windows, and then global dependencies across these regions are modeled via lightweight self-attention layers. This hierarchical design enables a progressive transition from localized perception to holistic facial understanding. Furthermore, we introduce an asymmetric multi-scanning strategy to eliminate redundant scanning directions and enhance local spatial perception. To address the high inter-class similarity among negative MEs, we introduce a Dual-Granularity Classification Module that decouples the recognition process into a coarse-to-fine paradigm. Experiments on two benchmark MER datasets demonstrate that MERba outperforms existing methods, with ablation studies confirming the effectiveness of each proposed component.
Problem

Research questions and friction points this paper is trying to address.

Capturing localized and global facial features for micro-expression recognition
Reducing redundancy in scanning directions for spatial perception
Addressing high inter-class similarity in negative micro-expressions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local-Global Feature Integration stages
Asymmetric multi-scanning strategy
Dual-Granularity Classification Module
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xinglong Mao
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
S
Shifeng Liu
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Sirui Zhao
Sirui Zhao
University of Science and Technology of China
Affective ComputingMLLMHCI
T
Tong Xu
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning