Embryology of a Language Model

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Understanding how internal computational structures emerge during language model training remains a fundamental challenge in interpretability research. Method: Inspired by embryology, we repurpose the sensitivity matrix—a tool from statistical physics—from validation to discovery, integrating it with UMAP to visualize the evolutionary trajectory of network organization across training. Contribution/Results: We identify, for the first time in language models, a global spatial structure analogous to a “body axis,” and discover novel functional modules—including a dedicated neural pathway for whitespace counting (“spacing fin”). Our structural developmental atlas not only recovers known circuits (e.g., induction heads) but also systematically traces the emergence pathway from local sensitivity to global functional differentiation. This work establishes a new framework—“interpretability development”—for studying how interpretability-relevant structures mature during training, offering both conceptual insight and a scalable methodology for mechanistic analysis of large language models.

Technology Category

Application Category

📝 Abstract
Understanding how language models develop their internal computational structure is a central problem in the science of deep learning. While susceptibilities, drawn from statistical physics, offer a promising analytical tool, their full potential for visualizing network organization remains untapped. In this work, we introduce an embryological approach, applying UMAP to the susceptibility matrix to visualize the model's structural development over training. Our visualizations reveal the emergence of a clear ``body plan,'' charting the formation of known features like the induction circuit and discovering previously unknown structures, such as a ``spacing fin'' dedicated to counting space tokens. This work demonstrates that susceptibility analysis can move beyond validation to uncover novel mechanisms, providing a powerful, holistic lens for studying the developmental principles of complex neural networks.
Problem

Research questions and friction points this paper is trying to address.

Understanding language model internal computational structure development
Visualizing network organization using susceptibility matrix analysis
Discovering novel neural structures and mechanisms in training
Innovation

Methods, ideas, or system contributions that make the work stand out.

UMAP on susceptibility matrix for visualization
Reveals model's structural development body plan
Discovers novel mechanisms like spacing fin
🔎 Similar Papers
No similar papers found.