GrokAlign: Geometric Characterisation and Acceleration of Grokking

📅 2025-06-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the “grokking” phenomenon—delayed emergence of generalization and robustness during deep network training—and uncovers its geometric origin in the dynamic alignment between the Jacobian matrix and the label space of training data. Method: We establish, for the first time, a causal link between Jacobian alignment and grokking, proposing GrokAlign—a novel regularizer that explicitly enforces alignment via centroid-based geometric constraints. Our approach integrates Jacobian geometric analysis, cosine similarity–based alignment measurement, low-rank Jacobian modeling, and differentiable alignment optimization. Contribution/Results: GrokAlign significantly accelerates grokking onset compared to baselines (e.g., weight decay), achieves superior generalization and robustness, and enables precise identification and tracking of stage-wise training dynamics. It offers a new paradigm for interpreting implicit optimization mechanisms in deep learning through interpretable, geometry-driven regularization.

Technology Category

Application Category

📝 Abstract
A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying href{https://thomaswalker1.github.io/blog/grokalign.html}{webpage} and href{https://github.com/ThomasWalker1/grokalign}{code}.
Problem

Research questions and friction points this paper is trying to address.

Understand and accelerate deep network training dynamics
Explain grokking via Jacobian alignment with data
Introduce GrokAlign for faster grokking induction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns Jacobians with training data
Uses low-rank Jacobian assumption
Introduces centroid alignment simplification
🔎 Similar Papers
No similar papers found.