Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the scaling relationship between training dynamics and final generalization performance in deep learning. Method: We propose a dual dynamical scaling law covering the entire training trajectory, bridging implicit bias and generalization via spectral complexity norm. Our approach integrates spectral analysis, implicit bias modeling, and binary cross-entropy optimization dynamics, validated empirically across CNNs, ResNets, and ViTs on MNIST, CIFAR, and other benchmarks. Contribution/Results: Theoretical analysis proves that implicit bias-induced spectral growth in single-layer perceptrons is equivalent to classical learning rules. Our framework unifies the test error scaling behavior across architectures—from shallow models to ViTs—reproducing and extending classical scaling laws. It establishes, for the first time, a causal chain: “implicit bias → spectral complexity evolution → generalization emergence,” significantly enhancing both interpretability and universality of deep learning generalization mechanisms.

Technology Category

Application Category

📝 Abstract
Scaling laws in deep learning - empirical power-law relationships linking model performance to resource growth - have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training or on the optimal training time given the model size. In this work, we uncover a richer picture by analyzing the entire training dynamics through the lens of spectral complexity norms. We identify two novel dynamical scaling laws that govern how performance evolves during training. These laws together recover the well-known test error scaling at convergence, offering a mechanistic explanation of generalization emergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a solvable model: a single-layer perceptron trained with binary cross-entropy. In this setting, we show that the growth of spectral complexity driven by the implicit bias mirrors the generalization behavior observed at fixed norm, allowing us to connect the performance dynamics to classical learning rules in the perceptron.
Problem

Research questions and friction points this paper is trying to address.

Understanding scaling laws in deep learning dynamics
Analyzing training evolution via spectral complexity norms
Linking implicit bias to generalization in perceptrons
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes training dynamics via spectral complexity norms
Identifies two novel dynamical scaling laws
Connects performance dynamics to classical learning rules
🔎 Similar Papers
No similar papers found.
Francesco D'Amico
Francesco D'Amico
PhD student, Sapienza University of Rome
Disordered SystemsNeural NetworksMachine Learning
Dario Bocchi
Dario Bocchi
Sapienza Università di Roma
theoretical physicsstatistical mechanics
M
Matteo Negri
Physics Department, University of Rome Sapienza, Piazzale Aldo Moro 5, Rome 00185; CNR-Nanotec Rome unit, Piazzale Aldo Moro 5, Rome 00185