High-dimensional Limit of SGD for Diagonal Linear Networks

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work investigates the optimization and generalization dynamics of stochastic gradient descent (SGD) in high-dimensional diagonal linear networks. By constructing a stochastic differential equation (SDE) to approximate SGD trajectories and deriving deterministic partial differential equations that govern the evolution of key statistical quantities such as risk and curvature, the study explicitly decouples the drift and gradient noise components of SGD for the first time in a high-dimensional setting. Building on this decomposition, the authors establish a globally well-posed non-asymptotic theoretical framework that guarantees exponential convergence to zero risk with high probability under appropriate parametrization. The theoretical predictions are corroborated by numerical experiments, demonstrating excellent agreement between analysis and empirical observation.

📝 Abstract

Understanding the behavior of stochastic gradient methods is a central problem in modern machine learning. Recent work has highlighted diagonal linear networks as a simplified yet expressive setting for analyzing the optimization and generalization properties of neural models. In this work, we show that in the high-dimensional regime, stochastic gradient descent on diagonal linear networks is well-approximated by continuous dynamics governed by a stochastic differential equation (SDE), which explicitly decouples the drift from the gradient noise. We further derive a deterministic partial differential equation whose solution propagates the relevant state of the iterates and characterizes the time evolution of a broad class of observable statistics, including the risk, curvature, and other metrics for optimality. Finally, we show that, under a suitable parametrization, the stochastic dynamics are globally well posed and converge exponentially fast to zero risk with high probability, yielding a fully explicit non-asymptotic description of their long-time behavior. Numerical simulations corroborate our theoretical findings.

Problem

Research questions and friction points this paper is trying to address.

stochastic gradient descent

diagonal linear networks

high-dimensional regime

optimization

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic gradient descent

diagonal linear networks

stochastic differential equation