MuMTAffect: A Multimodal Multitask Affective Framework for Personality and Emotion Recognition from Physiological Signals

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of jointly modeling emotion classification and personality re-identification from short-duration physiological signals. We propose a multimodal multi-task affective computing framework grounded in the Constructed Emotion Theory, which decouples core affective representations from higher-level cognitive interpretations. A cross-modal Transformer architecture integrates pupillary response, eye movement, facial action units, and electrodermal activity (EDA); personality prediction is incorporated as an auxiliary task to promote user-specific affective embedding learning. Each modality employs a dedicated Transformer encoder, and cross-modal interactions are modeled via a fusion layer. The framework jointly optimizes valence/arousal classification and personality re-identification. Evaluation on the AFFEC dataset demonstrates that EDA significantly improves arousal recognition, while pupillary and oculomotor features enhance valence discrimination. The model exhibits strong cross-subject generalization and a modular design enables flexible extension.

Technology Category

Application Category

📝 Abstract
We present MuMTAffect, a novel Multimodal Multitask Affective Embedding Network designed for joint emotion classification and personality prediction (re-identification) from short physiological signal segments. MuMTAffect integrates multiple physiological modalities pupil dilation, eye gaze, facial action units, and galvanic skin response using dedicated, transformer-based encoders for each modality and a fusion transformer to model cross-modal interactions. Inspired by the Theory of Constructed Emotion, the architecture explicitly separates core affect encoding (valence/arousal) from higher-level conceptualization, thereby grounding predictions in contemporary affective neuroscience. Personality trait prediction is leveraged as an auxiliary task to generate robust, user-specific affective embeddings, significantly enhancing emotion recognition performance. We evaluate MuMTAffect on the AFFEC dataset, demonstrating that stimulus-level emotional cues (Stim Emo) and galvanic skin response substantially improve arousal classification, while pupil and gaze data enhance valence discrimination. The inherent modularity of MuMTAffect allows effortless integration of additional modalities, ensuring scalability and adaptability. Extensive experiments and ablation studies underscore the efficacy of our multimodal multitask approach in creating personalized, context-aware affective computing systems, highlighting pathways for further advancements in cross-subject generalisation.
Problem

Research questions and friction points this paper is trying to address.

Joint emotion classification and personality prediction from physiological signals
Integrating multimodal physiological data using transformer-based encoders
Enhancing emotion recognition through user-specific affective embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal transformer encoders for physiological signals
Separates core affect encoding from conceptualization
Uses personality prediction as auxiliary task
Meisam Jamshidi Seikavandi
Meisam Jamshidi Seikavandi
Phd fellow at IT University of Copenhagen
computer visiondeep learningeye trackinghuman-computer interaction
F
Fabricio Batista Narcizo
GN Advanced Science, IT University of Copenhagen
T
Ted Vucurevich
GN Advanced Science
A
Andrew Burke Dittberner
GN Advanced Science
Paolo Burelli
Paolo Burelli
Associate Professor
Artificial IntelligenceData MiningComputer Games