The late-stage training dynamics of (stochastic) subgradient descent on homogeneous neural networks

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the implicit bias mechanism of homogeneous neural networks (e.g., ReLU MLPs/CNNs) under constant-step-size stochastic subgradient descent (SGD) in late-stage training for binary classification. Assuming linearly separable data with positive normalized margin, we extend Lyu & Li’s (2020) gradient descent discrete dynamical analysis to the nonsmooth and stochastic setting—establishing, for the first time, that normalized SGD iterates converge almost surely to the critical point set of the normalized classification margin. Methodologically, we integrate conservative flow modeling, nonsmooth dynamical systems theory, and stochastic subgradient optimization to derive convergence guarantees under both exponential and logistic loss. Our result provides the first theoretical characterization of SGD’s implicit regularization in a genuinely stochastic and nonsmooth regime, revealing the synergistic interplay between directional optimization and scale separation during late-phase training.

Technology Category

Application Category

📝 Abstract
We analyze the implicit bias of constant step stochastic subgradient descent (SGD). We consider the setting of binary classification with homogeneous neural networks - a large class of deep neural networks with ReLU-type activation functions such as MLPs and CNNs without biases. We interpret the dynamics of normalized SGD iterates as an Euler-like discretization of a conservative field flow that is naturally associated to the normalized classification margin. Owing to this interpretation, we show that normalized SGD iterates converge to the set of critical points of the normalized margin at late-stage training (i.e., assuming that the data is correctly classified with positive normalized margin). Up to our knowledge, this is the first extension of the analysis of Lyu and Li (2020) on the discrete dynamics of gradient descent to the nonsmooth and stochastic setting. Our main result applies to binary classification with exponential or logistic losses. We additionally discuss extensions to more general settings.
Problem

Research questions and friction points this paper is trying to address.

Analyzing SGD's implicit bias
Binary classification with homogeneous networks
Convergence to normalized margin critical points
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic subgradient descent analysis
Conservative field flow interpretation
Binary classification with homogeneous networks