Time Course MechInterp: Analyzing the Evolution of Components and Knowledge in Large Language Models

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Understanding how large language models (LLMs) acquire and represent factual knowledge remains critical for improving their interpretability and reliability. Method: Leveraging the full pretraining trajectory of OLMo-7B, we propose a temporal probing framework to quantitatively analyze the functional evolution of attention heads and feed-forward networks (FFNs) across entity identification, relation modeling, and factual QA tasks. Contribution/Results: We uncover, for the first time, dynamic role migration patterns: attention heads exhibit high turnover, whereas FFNs demonstrate strong functional stability; positional relational knowledge converges earlier than named-entity relational knowledge. The model initially relies on generic representations but progressively develops functional specialization and module reuse, enabling us to construct a fine-grained, temporally resolved knowledge formation atlas. This work provides the first empirical, time-resolved, and component-level evidence characterizing how factual knowledge emerges and organizes within LLMs during pretraining.

Technology Category

Application Category

📝 Abstract

Understanding how large language models (LLMs) acquire and store factual knowledge is crucial for enhancing their interpretability and reliability. In this work, we analyze the evolution of factual knowledge representation in the OLMo-7B model by tracking the roles of its attention heads and feed forward networks (FFNs) over the course of pre-training. We classify these components into four roles: general, entity, relation-answer, and fact-answer specific, and examine their stability and transitions. Our results show that LLMs initially depend on broad, general-purpose components, which later specialize as training progresses. Once the model reliably predicts answers, some components are repurposed, suggesting an adaptive learning process. Notably, attention heads display the highest turnover. We also present evidence that FFNs remain more stable throughout training. Furthermore, our probing experiments reveal that location-based relations converge to high accuracy earlier in training than name-based relations, highlighting how task complexity shapes acquisition dynamics. These insights offer a mechanistic view of knowledge formation in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Analyzing how LLMs acquire and store factual knowledge

Tracking evolution of attention heads and FFNs in OLMo-7B

Understanding stability and specialization of model components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Track attention heads and FFNs evolution

Classify components into four roles

Analyze stability and transitions dynamics

🔎 Similar Papers

Towards understanding evolution of science through language model series