๐ค AI Summary
To address the challenges of strong noise, significant inter-subject variability, and insufficient dynamic multimodal synchronization modeling in EEG-based emotion recognition, this paper proposes a temporal and cross-modal contrastive learning framework grounded in physiological synchrony mechanisms. We introduce a Cross-Modal Consistency Alignment (CM-CA) module that explicitly models semantic consistency and dynamic synchronization between EEG and peripheral physiological signals (e.g., GSR) across multiple time scales. Additionally, we propose Long-Short Temporal Contrastive Learning (LS-TCL) to capture emotion-related temporal dynamics in peripheral physiological signals at varying temporal resolutions. The method integrates hierarchical feature fusion with a pretraining-fine-tuning paradigm. Evaluated on the DEAP and DREAMER datasets, our approach achieves state-of-the-art performance under both unimodal (EEG-only) and cross-modal settings, demonstrating substantial improvements in robustness and generalizability.
๐ Abstract
Electroencephalography (EEG) signals provide a promising and involuntary reflection of brain activity related to emotional states, offering significant advantages over behavioral cues like facial expressions. However, EEG signals are often noisy, affected by artifacts, and vary across individuals, complicating emotion recognition. While multimodal approaches have used Peripheral Physiological Signals (PPS) like GSR to complement EEG, they often overlook the dynamic synchronization and consistent semantics between the modalities. Additionally, the temporal dynamics of emotional fluctuations across different time resolutions in PPS remain underexplored. To address these challenges, we propose PhysioSync, a novel pre-training framework leveraging temporal and cross-modal contrastive learning, inspired by physiological synchronization phenomena. PhysioSync incorporates Cross-Modal Consistency Alignment (CM-CA) to model dynamic relationships between EEG and complementary PPS, enabling emotion-related synchronizations across modalities. Besides, it introduces Long- and Short-Term Temporal Contrastive Learning (LS-TCL) to capture emotional synchronization at different temporal resolutions within modalities. After pre-training, cross-resolution and cross-modal features are hierarchically fused and fine-tuned to enhance emotion recognition. Experiments on DEAP and DREAMER datasets demonstrate PhysioSync's advanced performance under uni-modal and cross-modal conditions, highlighting its effectiveness for EEG-centered emotion recognition.