🤖 AI Summary
To address poor generalization in cross-dataset EEG-based emotion recognition, this paper proposes the Soft Contrastive Masking Modeling (SCMM) framework. SCMM introduces, for the first time, an emotion short-term continuity prior, enabling explicit modeling of emotional dynamics via a soft-weighted contrastive learning mechanism and a similarity-driven feature aggregation strategy over sample pairs. Additionally, a hybrid masking strategy is designed to enhance local temporal structure modeling in EEG signals. By integrating self-supervised representation learning with robust temporal modeling, SCMM significantly improves cross-dataset transferability. Extensive cross-dataset evaluations on SEED, SEED-IV, and DEAP demonstrate an average accuracy improvement of 4.26%, achieving state-of-the-art performance.
📝 Abstract
Emotion recognition using electroencephalography (EEG) signals has garnered widespread attention in recent years. However, existing studies have struggled to develop a sufficiently generalized model suitable for different datasets without re-training (cross-corpus). This difficulty arises because distribution differences across datasets far exceed the intra-dataset variability. To solve this problem, we propose a novel Soft Contrastive Masked Modeling (SCMM) framework. Inspired by emotional continuity, SCMM integrates soft contrastive learning with a new hybrid masking strategy to effectively mine the"short-term continuity"characteristics inherent in human emotions. During the self-supervised learning process, soft weights are assigned to sample pairs, enabling adaptive learning of similarity relationships across samples. Furthermore, we introduce an aggregator that weightedly aggregates complementary information from multiple close samples based on pairwise similarities among samples to enhance fine-grained feature representation, which is then used for original sample reconstruction. Extensive experiments on the SEED, SEED-IV and DEAP datasets show that SCMM achieves state-of-the-art (SOTA) performance, outperforming the second-best method by an average accuracy of 4.26% under two types of cross-corpus conditions (same-class and different-class) for EEG-based emotion recognition.