๐ค AI Summary
Micro-expression recognition faces the challenge of jointly modeling localized muscle movements and global facial dependencies. To address this, we propose a multi-receptive-field hierarchical architecture featuring a novel Local-Global Feature Integration stage that enables progressive modelingโfrom fine-grained action perception to holistic emotion understanding. We design an asymmetric multi-scan strategy to enhance spatial awareness and introduce a dual-granularity classification module to decouple the high similarity among negative classes. Our method integrates MambaVision Mixer, lightweight self-attention, non-overlapping window-based local feature extraction, and multi-scale spatiotemporal scanning. Evaluated on the benchmark CASME II and SAMM datasets, it achieves state-of-the-art performance. Ablation studies comprehensively validate the effectiveness and necessity of each component.
๐ Abstract
Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, holding significant potential in psychological diagnosis and criminal investigations. Despite notable advances in automatic ME recognition (MER), existing methods still struggle to jointly capture localized muscle activations and global facial dependencies, both critical for recognizing subtle emotional cues. To tackle this challenge, we propose MERba, a novel multi-receptive field architecture tailored for MER. MERba introduces a series of Local-Global Feature Integration stages, where fine-grained motion features are first extracted by local extractors containing MambaVision Mixers within non-overlapping windows, and then global dependencies across these regions are modeled via lightweight self-attention layers. This hierarchical design enables a progressive transition from localized perception to holistic facial understanding. Furthermore, we introduce an asymmetric multi-scanning strategy to eliminate redundant scanning directions and enhance local spatial perception. To address the high inter-class similarity among negative MEs, we introduce a Dual-Granularity Classification Module that decouples the recognition process into a coarse-to-fine paradigm. Experiments on two benchmark MER datasets demonstrate that MERba outperforms existing methods, with ablation studies confirming the effectiveness of each proposed component.