Discovering Semantic Subdimensions through Disentangled Conceptual Representations

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates fine-grained substructure within coarse-grained semantic categories (e.g., “animal”, “tool”). To this end, we propose the Decoupled Continuous Semantic Representation Model (DCSRM), the first data-driven approach to decompose large language model word embeddings into multiple interpretable sub-embeddings, each encoding a distinct semantic sub-dimension. We identify semantic polarity—such as size, animacy, and valence—as a key latent variable driving this decomposition. Our method integrates embedding disentanglement, interpretability analysis, and voxel-wise fMRI-based neural encoding modeling, with validation against empirical neuroimaging data. Results demonstrate that the identified sub-dimensions exhibit statistically significant and spatially specific cortical representations, markedly enhancing both the granularity of semantic representation and its cognitive and neurobiological interpretability.

Technology Category

Application Category

📝 Abstract
Understanding the core dimensions of conceptual semantics is fundamental to uncovering how meaning is organized in language and the brain. Existing approaches often rely on predefined semantic dimensions that offer only broad representations, overlooking finer conceptual distinctions. This paper proposes a novel framework to investigate the subdimensions underlying coarse-grained semantic dimensions. Specifically, we introduce a Disentangled Continuous Semantic Representation Model (DCSRM) that decomposes word embeddings from large language models into multiple sub-embeddings, each encoding specific semantic information. Using these sub-embeddings, we identify a set of interpretable semantic subdimensions. To assess their neural plausibility, we apply voxel-wise encoding models to map these subdimensions to brain activation. Our work offers more fine-grained interpretable semantic subdimensions of conceptual meaning. Further analyses reveal that semantic dimensions are structured according to distinct principles, with polarity emerging as a key factor driving their decomposition into subdimensions. The neural correlates of the identified subdimensions support their cognitive and neuroscientific plausibility.
Problem

Research questions and friction points this paper is trying to address.

Identifying fine-grained semantic subdimensions beyond broad predefined categories
Decomposing word embeddings to reveal interpretable conceptual distinctions
Mapping discovered subdimensions to neural representations for biological validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes word embeddings into sub-embeddings
Identifies interpretable semantic subdimensions through decomposition
Maps subdimensions to brain activation using encoding
🔎 Similar Papers
No similar papers found.
Yunhao Zhang
Yunhao Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingCognitive Science
Shaonan Wang
Shaonan Wang
The Hong Kong Polytechnic University
Natural Language Understanding of Machine and Mind
N
Nan Lin
State Key Laboratory of Cognitive Science and Mental Health, Institute of Psychology, CAS, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
Xinyi Dong
Xinyi Dong
State key laboratory of cognitive neuroscience and learning, Beijing Normal University
C
Chong Li
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
C
Chengqing Zong
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China