MERba: Multi-Receptive Field MambaVision for Micro-Expression Recognition

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Micro-expression recognition faces the challenge of jointly modeling localized muscle movements and global facial dependencies. To address this, we propose a multi-receptive-field hierarchical architecture featuring a novel Local-Global Feature Integration stage that enables progressive modeling—from fine-grained action perception to holistic emotion understanding. We design an asymmetric multi-scan strategy to enhance spatial awareness and introduce a dual-granularity classification module to decouple the high similarity among negative classes. Our method integrates MambaVision Mixer, lightweight self-attention, non-overlapping window-based local feature extraction, and multi-scale spatiotemporal scanning. Evaluated on the benchmark CASME II and SAMM datasets, it achieves state-of-the-art performance. Ablation studies comprehensively validate the effectiveness and necessity of each component.

Technology Category

Application Category

📝 Abstract

Micro-expressions (MEs) are brief, involuntary facial movements that reveal genuine emotions, holding significant potential in psychological diagnosis and criminal investigations. Despite notable advances in automatic ME recognition (MER), existing methods still struggle to jointly capture localized muscle activations and global facial dependencies, both critical for recognizing subtle emotional cues. To tackle this challenge, we propose MERba, a novel multi-receptive field architecture tailored for MER. MERba introduces a series of Local-Global Feature Integration stages, where fine-grained motion features are first extracted by local extractors containing MambaVision Mixers within non-overlapping windows, and then global dependencies across these regions are modeled via lightweight self-attention layers. This hierarchical design enables a progressive transition from localized perception to holistic facial understanding. Furthermore, we introduce an asymmetric multi-scanning strategy to eliminate redundant scanning directions and enhance local spatial perception. To address the high inter-class similarity among negative MEs, we introduce a Dual-Granularity Classification Module that decouples the recognition process into a coarse-to-fine paradigm. Experiments on two benchmark MER datasets demonstrate that MERba outperforms existing methods, with ablation studies confirming the effectiveness of each proposed component.

Problem

Research questions and friction points this paper is trying to address.

Capturing localized and global facial features for micro-expression recognition

Reducing redundancy in scanning directions for spatial perception

Addressing high inter-class similarity in negative micro-expressions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local-Global Feature Integration stages

Asymmetric multi-scanning strategy

Dual-Granularity Classification Module

🔎 Similar Papers

No similar papers found.

Authors to Follow