Toward a Realistic Encoding Model of Auditory Affective Understanding in the Brain

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the unresolved question of the dynamic neural mechanisms underlying emotion arousal elicited by complex auditory stimuli. We propose a neuroscience-inspired, multi-level computational framework that integrates classical acoustic features with intermediate-layer representations from wav2vec 2.0 and HuBERT, enabling cross-dataset joint modeling of behavioral emotion annotations and neural response synchrony. Key findings include: (1) intermediate layers of wav2vec 2.0/HuBERT capture emotion-eliciting features more effectively than final semantic layers; (2) human voice preferentially activates prefrontal and temporal regions, whereas musical accompaniment specifically enhances limbic system engagement—both biases correlating significantly with spectral energy distribution; and (3) high-level semantic features significantly improve prediction accuracy for both behavioral emotion ratings (p < 0.05) and neural responses. Collectively, these results establish an interpretable, multi-scale computational model of auditory affective neural encoding.

Technology Category

Application Category

📝 Abstract
In affective neuroscience and emotion-aware AI, understanding how complex auditory stimuli drive emotion arousal dynamics remains unresolved. This study introduces a computational framework to model the brain's encoding of naturalistic auditory inputs into dynamic behavioral/neural responses across three datasets (SEED, LIRIS, self-collected BAVE). Guided by neurobiological principles of parallel auditory hierarchy, we decompose audio into multilevel auditory features (through classical algorithms and wav2vec 2.0/Hubert) from the original and isolated human voice/background soundtrack elements, mapping them to emotion-related responses via cross-dataset analyses. Our analysis reveals that high-level semantic representations (derived from the final layer of wav2vec 2.0/Hubert) exert a dominant role in emotion encoding, outperforming low-level acoustic features with significantly stronger mappings to behavioral annotations and dynamic neural synchrony across most brain regions ($p < 0.05$). Notably, middle layers of wav2vec 2.0/hubert (balancing acoustic-semantic information) surpass the final layers in emotion induction across datasets. Moreover, human voices and soundtracks show dataset-dependent emotion-evoking biases aligned with stimulus energy distribution (e.g., LIRIS favors soundtracks due to higher background energy), with neural analyses indicating voices dominate prefrontal/temporal activity while soundtracks excel in limbic regions. By integrating affective computing and neuroscience, this work uncovers hierarchical mechanisms of auditory-emotion encoding, providing a foundation for adaptive emotion-aware systems and cross-disciplinary explorations of audio-affective interactions.
Problem

Research questions and friction points this paper is trying to address.

Modeling brain encoding of auditory inputs into emotional responses
Investigating hierarchical auditory features' role in emotion encoding
Comparing voice and soundtrack contributions to neural emotion processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilevel auditory feature decomposition using wav2vec 2.0/Hubert
Cross-dataset mapping of audio features to emotional responses
Hierarchical mechanisms revealing semantic dominance in emotion encoding
🔎 Similar Papers
No similar papers found.
G
Guandong Pan
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory
Y
Yaqian Yang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory
S
Shi Chen
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory
X
Xin Wang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory
L
Longzhao Liu
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory
Hongwei Zheng
Hongwei Zheng
Shanghai Jiao Tong University
计算机视觉、联邦学习
S
Shaoting Tang
School of Computer Science and Engineering, Beihang University, Beijing 100191, China, School of Artificial Intelligence, Beihang University, Beijing 100191, China, Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China, Key laboratory of Mathematics, Informatics and Behavioral Semantics, Beihang University, Beijing 100191, China, Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beihang University, Beijing 100191, China, Zhongguancun Laboratory