🤖 AI Summary
The internal representation of abstract semantic structures—particularly semantic roles—in large language models (LLMs) remains poorly understood.
Method: We propose the “role-crossing minimal pair” localization method, integrating temporal emergence analysis, attribution tracing, and cross-architectural comparison across model scales to systematically identify and validate semantic role circuits.
Contribution/Results: We discover a compact circuit comprising only 28 neurons that accounts for 89%–94% of semantic role attribution, exhibiting strong causal isolation and partial transferability across model sizes. Further analysis reveals the circuit operates via a progressive structural refinement mechanism, characterized by cross-scale conservation and high spectral similarity. This work provides the first neuron-level characterization of the dynamic formation pathway of semantic roles in LLMs, establishing a novel interpretability paradigm for semantic structure in foundation models.
📝 Abstract
Despite displaying semantic competence, large language models' internal mechanisms that ground abstract semantic structure remain insufficiently characterised. We propose a method integrating role-cross minimal pairs, temporal emergence analysis, and cross-model comparison to study how LLMs implement semantic roles. Our analysis uncovers: (i) highly concentrated circuits (89-94% attribution within 28 nodes); (ii) gradual structural refinement rather than phase transitions, with larger models sometimes bypassing localised circuits; and (iii) moderate cross-scale conservation (24-59% component overlap) alongside high spectral similarity. These findings suggest that LLMs form compact, causally isolated mechanisms for abstract semantic structure, and these mechanisms exhibit partial transfer across scales and architectures.