Bridging the behavior-neural gap: A multimodal AI reveals the brain's geometry of emotion more accurately than human self-reports

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the “behavior–neural gap”—the limited capacity of subjective self-reported affect to accurately predict neural activity. We propose a multimodal large language model (MLLM) as a cognitive agent, trained on millions of video-based affective similarity judgments to construct a 30-dimensional sensory-grounded affective embedding space. Results demonstrate that MLLM-derived representations significantly outperform both unimodal large language models (LLMs) and traditional self-report scale–based features in predicting fMRI responses in emotion-related brain regions—including the amygdala and anterior cingulate cortex. Moreover, the high-dimensional geometric structure of the MLLM affective space more faithfully aligns with neural representational geometry. To our knowledge, this is the first work to show that sensory-grounded multimodal AI representations can surpass human behavioral reports in neural predictivity, establishing a novel paradigm for decoding the neurocomputational mechanisms of emotion.

Technology Category

Application Category

📝 Abstract
The ability to represent emotion plays a significant role in human cognition and social interaction, yet the high-dimensional geometry of this affective space and its neural underpinnings remain debated. A key challenge, the `behavior-neural gap,' is the limited ability of human self-reports to predict brain activity. Here we test the hypothesis that this gap arises from the constraints of traditional rating scales and that large-scale similarity judgments can more faithfully capture the brain's affective geometry. Using AI models as `cognitive agents,' we collected millions of triplet odd-one-out judgments from a multimodal large language model (MLLM) and a language-only model (LLM) in response to 2,180 emotionally evocative videos. We found that the emergent 30-dimensional embeddings from these models are highly interpretable and organize emotion primarily along categorical lines, yet in a blended fashion that incorporates dimensional properties. Most remarkably, the MLLM's representation predicted neural activity in human emotion-processing networks with the highest accuracy, outperforming not only the LLM but also, counterintuitively, representations derived directly from human behavioral ratings. This result supports our primary hypothesis and suggests that sensory grounding--learning from rich visual data--is critical for developing a truly neurally-aligned conceptual framework for emotion. Our findings provide compelling evidence that MLLMs can autonomously develop rich, neurally-aligned affective representations, offering a powerful paradigm to bridge the gap between subjective experience and its neural substrates. Project page: https://reedonepeck.github.io/ai-emotion.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Human self-reports poorly predict brain activity in emotion processing
Traditional rating scales inadequately capture brain's affective geometry
Subjective experience and neural substrates remain disconnected in emotion research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal AI models generate emotion embeddings from videos
AI representations predict neural activity better than human ratings
Sensory grounding enables neural alignment of affective concepts
🔎 Similar Papers
No similar papers found.
Changde Du
Changde Du
Institute of Automation, Chinese Academy of Sciences
machine learningcomputer visioncomputational neurosciencebrain-computer interface(BCI)artificial intelligence
Yizhuo Lu
Yizhuo Lu
中科院自动化研究所
人工智能、神经编解码
Z
Zhongyu Huang
State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Y
Yi Sun
State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Z
Zisen Zhou
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
Shaozheng Qin
Shaozheng Qin
State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, China
Huiguang He
Huiguang He
Institute of Automation, Chinese Academy of Scineces
Artificial Intelligencemedical image processingBrain Computer Interface