Grokking as Dimensional Phase Transition in Neural Networks

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This study investigates the dynamical mechanisms underlying grokking—the abrupt transition from memorization to generalization in neural networks. Through finite-size scaling analysis and a gradient avalanche model, it reveals for the first time that grokking is fundamentally a dimensionality phase transition driven by the geometry of the gradient field: the effective dimensionality $ D(t) $ jumps from a subdiffusive regime ($ D < 1 $) to a superdiffusive one ($ D > 1 $) precisely at the onset of generalization. This transition exhibits self-organized criticality and remains robust across diverse network topologies. Although backpropagation-induced correlations in real training scenarios lead to dimensional overshooting, the critical behavior of $ D(t) $ crossing unity persists unchanged. The work thus unifies grokking as a universal, architecture-agnostic dynamical phenomenon.

Technology Category

Application Category

📝 Abstract

Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.

Problem

Research questions and friction points this paper is trying to address.

grokking

dimensional phase transition

neural networks

generalization

gradient dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

dimensional phase transition

grokking

gradient field geometry