Clustering Head: A Visual Case Study of the Training Dynamics in Transformers

📅 2024-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the intrinsic learning mechanisms of Transformers on sparse modular addition, focusing on the representation and evolution of task invariance within the R² embedding space. Method: We construct a 2D embedding visualization sandbox to systematically track training dynamics across attention heads and feed-forward networks (FFNs) layer-wise. Contribution/Results: We identify and name a novel circuit—“clustering heads”—that explicitly separates modular addition equivalence classes via interpretable clustering behavior, exhibiting a two-phase learning trajectory: coarse-grained clustering followed by boundary refinement. We demonstrate that this circuit emerges under sensitivity to weight initialization, curriculum learning strategies, and high-curvature geometry induced by normalization layers—factors that also explain characteristic loss spikes during training. Our approach achieves fine-grained interpretability of Transformer training on a controlled task, offering a new paradigm for analyzing structured inductive biases in deep sequence models.

Technology Category

Application Category

📝 Abstract
This paper introduces the sparse modular addition task and examines how transformers learn it. We focus on transformers with embeddings in $R^2$ and introduce a visual sandbox that provides comprehensive visualizations of each layer throughout the training process. We reveal a type of circuit, called"clustering heads,"which learns the problem's invariants. We analyze the training dynamics of these circuits, highlighting two-stage learning, loss spikes due to high curvature or normalization layers, and the effects of initialization and curriculum learning.
Problem

Research questions and friction points this paper is trying to address.

Transformer Models
Sparse Modulation Learning
Training Dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Module Addition Task
Transformer Circuits
Visual Sandbox for Training Dynamics
🔎 Similar Papers
No similar papers found.