๐ค AI Summary
This study addresses the limitation of existing brain-to-text systems that largely overlook affective content and rely on discrete emotion labels, thereby failing to capture individualized emotional nuances. To overcome this, the authors propose EmoMindโthe first end-to-end framework that directly generates continuous, personalized emotional captions from fMRI signals. EmoMind first decodes a neutral semantic description and then integrates a 34-dimensional continuous emotion vector derived from the same fMRI data. Using classifier-free guidance, it rewrites the neutral caption into an emotionally enriched version, while an identity-preserving null branch enables smooth interpolation between semantic and affective representations. Evaluated on two independent fMRI datasets, EmoMind significantly outperforms GPT-4โbased discrete-label baselines in individual specificity, emotional structure geometry, and causal controllability, with the largest gains observed on metrics dependent on individualized affective structures.
๐ Abstract
Decoding visual experience from brain activity has advanced substantially, but cur- rent brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with categorical labels, but such labels collapse rich inter-subject variability into coarse discrete bins. We present EmoMind, the first end-to-end pipeline for decoding affective captions directly from fMRI signals. EmoMind first retrieves a semanti- cally grounded neutral scene description from brain-decoded visual features, then rewrites it using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. To control the balance between content preservation and affective expression, we train the rewriter with classifier-free guidance against an identity-preserving null branch, enabling smooth interpolation between semantic fidelity and affective expressivity. We evaluate affective caption generation with a three-axis validation framework spanning subject-specificity, structural geometry, and causal control. We further augment this framework with a synthetic-brain substitution test that probes robustness to the measurement apparatus, and we benchmark each axis against GPT-4 prompted with brain-decoded top-5 emotion labels as a strong discrete baseline. Across two independent emotion fMRI datasets, EmoMind significantly outperforms label-prompted GPT-4 on all three axes, with the largest gains on metrics that require person-specific affective structure rather than population-level emotion aggregation. These results establish continuous brain-decoded affect as a viable control signal for individualized affective cap- tion generation and open new directions for studying individual affective brain organisation.