MMME: A Spontaneous Multi-Modal Micro-Expression Dataset Enabling Visual-Physiological Fusion

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

Existing micro-expression (ME) research relies solely on visual modalities, overlooking affective cues embedded in physiological signals, thereby limiting recognition and spotting performance. To address this, we introduce the first millisecond-synchronized visual–physiological multimodal ME dataset, comprising facial videos alongside six physiological signal types: EEG, PPG, RSP, SKT, EDA, and ECG. The dataset contains 634 micro-expressions, 2841 macro-expressions, and 2,890 synchronized multimodal samples. We propose a novel temporal alignment algorithm and a cross-modal feature fusion framework to establish a visual–physiological collaborative modeling mechanism. Experimental results demonstrate that integrating physiological signals improves ME recognition accuracy by 12.7% and spotting F1-score by 15.3%. This work advances ME analysis from unimodal to multimodal fusion paradigms, enabling more robust and fine-grained affective inference.

Technology Category

Application Category

📝 Abstract

Micro-expressions (MEs) are subtle, fleeting nonverbal cues that reveal an individual's genuine emotional state. Their analysis has attracted considerable interest due to its promising applications in fields such as healthcare, criminal investigation, and human-computer interaction. However, existing ME research is limited to single visual modality, overlooking the rich emotional information conveyed by other physiological modalities, resulting in ME recognition and spotting performance far below practical application needs. Therefore, exploring the cross-modal association mechanism between ME visual features and physiological signals (PS), and developing a multimodal fusion framework, represents a pivotal step toward advancing ME analysis. This study introduces a novel ME dataset, MMME, which, for the first time, enables synchronized collection of facial action signals (MEs), central nervous system signals (EEG), and peripheral PS (PPG, RSP, SKT, EDA, and ECG). By overcoming the constraints of existing ME corpora, MMME comprises 634 MEs, 2,841 macro-expressions (MaEs), and 2,890 trials of synchronized multimodal PS, establishing a robust foundation for investigating ME neural mechanisms and conducting multimodal fusion-based analyses. Extensive experiments validate the dataset's reliability and provide benchmarks for ME analysis, demonstrating that integrating MEs with PS significantly enhances recognition and spotting performance. To the best of our knowledge, MMME is the most comprehensive ME dataset to date in terms of modality diversity. It provides critical data support for exploring the neural mechanisms of MEs and uncovering the visual-physiological synergistic effects, driving a paradigm shift in ME research from single-modality visual analysis to multimodal fusion. The dataset will be publicly available upon acceptance of this paper.

Problem

Research questions and friction points this paper is trying to address.

Exploring cross-modal association between micro-expressions and physiological signals

Developing a multimodal fusion framework for micro-expression analysis

Overcoming limitations of single-modality datasets in micro-expression research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronized multi-modal data collection

Visual-physiological fusion framework

Comprehensive ME dataset MMME

🔎 Similar Papers

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges