How LLMs Learn: Tracing Internal Representations with Sparse Autoencoders

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This study investigates the dynamic evolution of internal representations during large language model (LLM) training, specifically examining the temporal progression of multilingual competence and abstract knowledge acquisition. Method: We deploy sparse autoencoders across multiple training checkpoints and conduct interpretability analysis on intermediate-layer representations to systematically trace the developmental trajectory of linguistic knowledge and conceptual encodings. Contribution/Results: We uncover, for the first time, a strict staged cognitive development pattern in LLMs: monolingual, language-specific knowledge emerges before cross-lingual alignment; token-level representations precede hierarchical abstraction toward higher-order concepts. Our work establishes the first dynamic representation evolution map for LLMs, empirically demonstrating the temporal hierarchy inherent in language learning. These findings provide critical evidence and a novel analytical framework for understanding the intrinsic learning mechanisms of foundation models.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) demonstrate remarkable multilingual capabilities and broad knowledge. However, the internal mechanisms underlying the development of these capabilities remain poorly understood. To investigate this, we analyze how the information encoded in LLMs' internal representations evolves during the training process. Specifically, we train sparse autoencoders at multiple checkpoints of the model and systematically compare the interpretative results across these stages. Our findings suggest that LLMs initially acquire language-specific knowledge independently, followed by cross-linguistic correspondences. Moreover, we observe that after mastering token-level knowledge, the model transitions to learning higher-level, abstract concepts, indicating the development of more conceptual understanding.

Problem

Research questions and friction points this paper is trying to address.

Understanding internal mechanisms of LLMs' multilingual capabilities

Analyzing evolution of information in LLMs' internal representations

Investigating transition from token-level to abstract concept learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders trace LLM internal representations.

Analyze information evolution during LLM training.

Identify transition from token-level to abstract learning.

🔎 Similar Papers

Sparse Autoencoders Reveal Universal Feature Spaces Across Large Language Models