Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

The opaque mapping between self-attention operations and interpretable linguistic computations—particularly word sense disambiguation (WSD)—hinders understanding of how Transformer attention heads acquire functional specialization during training. Method: We propose a developmental probing framework, leveraging Pythia model checkpoints to systematically track attention head evolution via correlation analysis of attention behavior, targeted stimulus perturbations, and causal ablation experiments. Results: We identify a clear developmental trajectory: smaller models exhibit position-sensitive WSD heads, whereas larger models (e.g., 410M parameters) spontaneously develop robust, generalizable WSD-specialized heads. These critical heads maintain stable performance across diverse input perturbations, and their causal ablation significantly impairs WSD accuracy—providing the first empirical evidence of how and when attention heads specialize for lexical semantics during training.

Technology Category

Application Category

📝 Abstract

Despite an in-principle understanding of self-attention matrix operations in Transformer language models (LMs), it remains unclear precisely how these operations map onto interpretable computations or functions--and how or when individual attention heads develop specialized attention patterns. Here, we present a pipeline to systematically probe attention mechanisms, and we illustrate its value by leveraging lexical ambiguity--where a single word has multiple meanings--to isolate attention mechanisms that contribute to word sense disambiguation. We take a "developmental" approach: first, using publicly available Pythia LM checkpoints, we identify inflection points in disambiguation performance for each LM in the suite; in 14M and 410M, we identify heads whose attention to disambiguating words covaries with overall disambiguation performance across development. We then stress-test the robustness of these heads to stimulus perturbations: in 14M, we find limited robustness, but in 410M, we identify multiple heads with surprisingly generalizable behavior. Then, in a causal analysis, we find that ablating the target heads demonstrably impairs disambiguation performance, particularly in 14M. We additionally reproduce developmental analyses of 14M across all of its random seeds. Together, these results suggest: that disambiguation benefits from a constellation of mechanisms, some of which (especially in 14M) are highly sensitive to the position and part-of-speech of the disambiguating cue; and that larger models (410M) may contain heads with more robust disambiguation behavior. They also join a growing body of work that highlights the value of adopting a developmental perspective when probing LM mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Investigates how attention heads in Transformer models develop specialized patterns for disambiguation.

Probes the robustness of attention mechanisms to stimulus perturbations across model sizes.

Assesses the causal impact of ablating specific heads on word sense disambiguation performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Probe attention mechanisms using lexical ambiguity

Identify heads covarying with disambiguation performance

Test robustness via ablation and stimulus perturbations

🔎 Similar Papers

No similar papers found.