Exact Dynamics of Multi-class Stochastic Gradient Descent

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work investigates the dynamical behavior and learning-rate evolution of single-pass stochastic gradient descent (SGD) on high-dimensional, multi-class, anisotropic data. We develop the first exact dynamical analysis framework for multi-class SGD under Gaussian mixture models with covariance spectra exhibiting power-law, rank-one, or isotropic structures—crucially allowing the number of classes to scale with dimension. Our method employs a mean-field ordinary differential equation (ODE) approximation to characterize the asymptotic training trajectory. Theoretically, we uncover that covariance anisotropy induces sharp phase transitions and derive closed-form expressions for both population risk and signal overlap. We prove that SGD spontaneously aligns with low-variance “clean” subspaces, yielding substantial generalization gains. This work extends high-dimensional SGD theory to large-scale multi-class settings for the first time and provides verifiable, asymptotically exact characterizations of the training loss.

Technology Category

Application Category

📝 Abstract

We develop a framework for analyzing the training and learning rate dynamics on a variety of high- dimensional optimization problems trained using one-pass stochastic gradient descent (SGD) with data generated from multiple anisotropic classes. We give exact expressions for a large class of functions of the limiting dynamics, including the risk and the overlap with the true signal, in terms of a deterministic solution to a system of ODEs. We extend the existing theory of high-dimensional SGD dynamics to Gaussian-mixture data and a large (growing with the parameter size) number of classes. We then investigate in detail the effect of the anisotropic structure of the covariance of the data in the problems of binary logistic regression and least square loss. We study three cases: isotropic covariances, data covariance matrices with a large fraction of zero eigenvalues (denoted as the zero-one model), and covariance matrices with spectra following a power-law distribution. We show that there exists a structural phase transition. In particular, we demonstrate that, for the zero-one model and the power-law model with sufficiently large power, SGD tends to align more closely with values of the class mean that are projected onto the "clean directions" (i.e., directions of smaller variance). This is supported by both numerical simulations and analytical studies, which show the exact asymptotic behavior of the loss in the high-dimensional limit.

Problem

Research questions and friction points this paper is trying to address.

Analyzes SGD dynamics for multi-class anisotropic data in high dimensions

Extends theory to Gaussian-mixture data with growing class numbers

Investigates anisotropic covariance effects on logistic regression and least squares

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exact expressions for SGD dynamics via ODE system

Extends high-dimensional SGD theory to Gaussian-mixture data

Shows structural phase transition in anisotropic covariances

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation