The Mechanistic Emergence of Symbol Grounding in Language Models

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how symbol grounding—i.e., the spontaneous emergence of referential meaning linking linguistic symbols to perceptual inputs—arises in language models without explicit supervision. We develop a controlled evaluation framework integrating mechanistic interpretability analysis with causal intervention experiments to trace the dynamic alignment between symbolic representations and visual inputs within vision-language models, identifying its locus primarily in intermediate layers and its realization via coordinated aggregation across specific attention heads. For the first time, we establish causal grounding validation in both Transformer- and state-space-based architectures, demonstrating its robustness across multimodal dialogue settings and diverse model families—including bidirectional models—while finding no evidence in unidirectional LSTMs. Our findings uncover a general-purpose aggregation mechanism that substantially improves reliability prediction of generated content, offering a novel paradigm for understanding semantic emergence in large foundation models.

Technology Category

Application Category

📝 Abstract
Symbol grounding (Harnad, 1990) describes how symbols such as words acquire their meanings by connecting to real-world sensorimotor experiences. Recent work has shown preliminary evidence that grounding may emerge in (vision-)language models trained at scale without using explicit grounding objectives. Yet, the specific loci of this emergence and the mechanisms that drive it remain largely unexplored. To address this problem, we introduce a controlled evaluation framework that systematically traces how symbol grounding arises within the internal computations through mechanistic and causal analysis. Our findings show that grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms. This phenomenon replicates in multimodal dialogue and across architectures (Transformers and state-space models), but not in unidirectional LSTMs. Our results provide behavioral and mechanistic evidence that symbol grounding can emerge in language models, with practical implications for predicting and potentially controlling the reliability of generation.
Problem

Research questions and friction points this paper is trying to address.

Investigating how symbols acquire meaning in language models
Identifying specific loci and mechanisms driving symbol grounding emergence
Developing evaluation framework to trace grounding through causal analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mechanistic causal analysis traces grounding emergence
Aggregate attention heads implement grounding mechanism
Grounding concentrates in middle-layer transformer computations
🔎 Similar Papers
No similar papers found.