The Complexity Dynamics of Grokking

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the complexity phase transition underlying the shift from overfitting to generalization in neural network training, focusing on the intrinsic compression dynamics that drive generalization emergence (“insight”). Method: We propose a computable intrinsic complexity measure grounded in Kolmogorov complexity and establish, for the first time, a causal link between complexity evolution and generalization transitions: intrinsic complexity peaks mid-training and then monotonically decreases, with this decline strictly coinciding with the onset of generalization emergence. Leveraging this insight, we design spectral entropy regularization—a novel regularization technique—and theoretically derive the first provable generalization bound based on intrinsic complexity evolution. Results: Our method achieves superior model compression and enhanced generalization on CIFAR-10/100. The framework unifies rate-distortion theory, minimum description length, and deep learning dynamics, offering a new paradigm for understanding deep learning’s inductive bias.

Technology Category

Application Category

📝 Abstract
We investigate the phenomenon of generalization through the lens of compression. In particular, we study the complexity dynamics of neural networks to explain grokking, where networks suddenly transition from memorizing to generalizing solutions long after over-fitting the training data. To this end we introduce a new measure of intrinsic complexity for neural networks based on the theory of Kolmogorov complexity. Tracking this metric throughout network training, we find a consistent pattern in training dynamics, consisting of a rise and fall in complexity. We demonstrate that this corresponds to memorization followed by generalization. Based on insights from rate--distortion theory and the minimum description length principle, we lay out a principled approach to lossy compression of neural networks, and connect our complexity measure to explicit generalization bounds. Based on a careful analysis of information capacity in neural networks, we propose a new regularization method which encourages networks towards low-rank representations by penalizing their spectral entropy, and find that our regularizer outperforms baselines in total compression of the dataset.
Problem

Research questions and friction points this paper is trying to address.

Studying complexity phase transition in neural networks
Characterizing grokking phenomenon from memorization to generalization
Measuring network complexity via compression and regularization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rate-distortion theory measures network complexity
Spectral entropy regularization reduces intrinsic dimension
Lossy compression framework tracks complexity dynamics
🔎 Similar Papers
No similar papers found.