Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of classical algorithmic stability analyses in non-convex optimization, which typically require a stringent learning rate decay of \(O(1/t)\)—a condition that hampers optimization efficiency and contradicts common empirical practice. Focusing on homogeneous neural networks, such as fully connected or convolutional architectures with ReLU or LeakyReLU activations, the paper bridges algorithmic stability theory with non-convex optimization analysis to relax this constraint. Under mild assumptions, it establishes generalization bounds that permit a significantly slower learning rate decay of \(\Omega(1/\sqrt{t})\). The resulting framework not only accommodates non-Lipschitz settings but also markedly improves the alignment between theoretical guarantees and practical training dynamics.

Technology Category

Application Category

📝 Abstract
Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $η_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $Ω(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.
Problem

Research questions and friction points this paper is trying to address.

generalization bounds
algorithmic stability
stochastic gradient descent
homogeneous neural networks
stepsize decay
Innovation

Methods, ideas, or system contributions that make the work stand out.

algorithmic stability
generalization bounds
homogeneous neural networks
stochastic gradient descent
learning rate decay
🔎 Similar Papers
No similar papers found.