Grokking as Dimensional Phase Transition in Neural Networks

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates the dynamical mechanisms underlying grokkingβ€”the abrupt transition from memorization to generalization in neural networks. Through finite-size scaling analysis and a gradient avalanche model, it reveals for the first time that grokking is fundamentally a dimensionality phase transition driven by the geometry of the gradient field: the effective dimensionality \( D(t) \) jumps from a subdiffusive regime (\( D < 1 \)) to a superdiffusive one (\( D > 1 \)) precisely at the onset of generalization. This transition exhibits self-organized criticality and remains robust across diverse network topologies. Although backpropagation-induced correlations in real training scenarios lead to dimensional overshooting, the critical behavior of \( D(t) \) crossing unity persists unchanged. The work thus unifies grokking as a universal, architecture-agnostic dynamical phenomenon.
πŸ“ Abstract
Neural network grokking -- the abrupt memorization-to-generalization transition -- challenges our understanding of learning dynamics. Through finite-size scaling of gradient avalanche dynamics across eight model scales, we find that grokking is a \textit{dimensional phase transition}: effective dimensionality~$D$ crosses from sub-diffusive (subcritical, $D < 1$) to super-diffusive (supercritical, $D > 1$) at generalization onset, exhibiting self-organized criticality (SOC). Crucially, $D$ reflects \textbf{gradient field geometry}, not network architecture: synthetic i.i.d.\ Gaussian gradients maintain $D \approx 1$ regardless of graph topology, while real training exhibits dimensional excess from backpropagation correlations. The grokking-localized $D(t)$ crossing -- robust across topologies -- offers new insight into the trainability of overparameterized networks.
Problem

Research questions and friction points this paper is trying to address.

grokking
dimensional phase transition
neural networks
generalization
gradient dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

dimensional phase transition
grokking
gradient field geometry
self-organized criticality
effective dimensionality
πŸ”Ž Similar Papers
No similar papers found.