🤖 AI Summary
This work addresses a critical gap in language model interpretability research, which has predominantly focused on local representations while overlooking global structural organization. The authors propose StructLens, a novel framework that introduces syntactic dependency-inspired tree structures into representation analysis. By constructing maximum spanning trees over semantic representations derived from residual streams, StructLens systematically characterizes the hierarchical organization of token representations. Integrating semantic representation analysis, residual stream probing, and pretraining dynamics tracking, the study reveals that intermediate model layers exhibit the strongest local span structure. Furthermore, it uncovers a hierarchical evolution pattern during pretraining: small semantic units emerge early and are progressively integrated into larger constituents in later stages, thereby elucidating the developmental trajectory of representational structure.
📝 Abstract
Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of language models, existing approaches focus on local inter-token relationships within layers or modules (e.g., Multi-Head Attention), leaving global inter-layer relationships largely overlooked. To address this gap, we introduce StructLens, an analytical framework designed to reveal how internal structures relate holistically through their inter-token connection within a layer. StructLens constructs maximum spanning trees based on the semantic representations in residual streams, analogous to dependency parsing, and leverages the tree properties to quantify inter-layer distance (or similarity) from a structural perspective. Our findings demonstrate that StructLens yields an inter-layer similarity pattern that is distinctively different from conventional cosine similarity. Moreover, this structure-aware similarity proves to be beneficial for practical tasks, such as layer pruning, highlighting the effectiveness of structural analysis for understanding and optimizing language models. Our code is available at https://github.com/naist-nlp/structlens.