🤖 AI Summary
Existing self-supervised methods for multimodal physiological signal-based affect recognition (e.g., EEG, EDA) struggle to model high-order cross-modal dependencies and rely solely on pairwise alignment, neglecting joint dynamic interactions across modalities. To address this, we propose the first unpaired self-supervised framework explicitly designed for learning high-order joint dependencies. Our method introduces a novel Dual Total Correlation (DTC) objective, integrating Functional Maximum Correlation Analysis (FMCA) with trace upper-bound optimization to directly maximize high-order statistical dependence between central (brain) and autonomic nervous system responses—without requiring modality-wise temporal alignment. This yields more discriminative representations. On CEAP-360VR, our approach improves subject-specific accuracy by 7.9% and cross-subject EDA-only performance by 5.6%. On the most challenging cross-subject EEG task of MAHNOB-HCI, it achieves 98.2% accuracy—only 0.8 percentage points below the current state-of-the-art.
📝 Abstract
Emotional states manifest as coordinated yet heterogeneous physiological responses across central and autonomic systems, posing a fundamental challenge for multimodal representation learning in affective computing. Learning such joint dynamics is further complicated by the scarcity and subjectivity of affective annotations, which motivates the use of self-supervised learning (SSL). However, most existing SSL approaches rely on pairwise alignment objectives, which are insufficient to characterize dependencies among more than two modalities and fail to capture higher-order interactions arising from coordinated brain and autonomic responses.
To address this limitation, we propose Multimodal Functional Maximum Correlation (MFMC), a principled SSL framework that maximizes higher-order multimodal dependence through a Dual Total Correlation (DTC) objective. By deriving a tight sandwich bound and optimizing it using a functional maximum correlation analysis (FMCA) based trace surrogate, MFMC captures joint multimodal interactions directly, without relying on pairwise contrastive losses.
Experiments on three public affective computing benchmarks demonstrate that MFMC consistently achieves state-of-the-art or competitive performance under both subject-dependent and subject-independent evaluation protocols, highlighting its robustness to inter-subject variability. In particular, MFMC improves subject-dependent accuracy on CEAP-360VR from 78.9% to 86.8%, and subject-independent accuracy from 27.5% to 33.1% using the EDA signal alone. Moreover, MFMC remains within 0.8 percentage points of the best-performing method on the most challenging EEG subject-independent split of MAHNOB-HCI. Our code is available at https://github.com/DY9910/MFMC.