🤖 AI Summary
This work addresses the limitations of classical algorithmic stability analyses in non-convex optimization, which typically require a stringent learning rate decay of \(O(1/t)\)—a condition that hampers optimization efficiency and contradicts common empirical practice. Focusing on homogeneous neural networks, such as fully connected or convolutional architectures with ReLU or LeakyReLU activations, the paper bridges algorithmic stability theory with non-convex optimization analysis to relax this constraint. Under mild assumptions, it establishes generalization bounds that permit a significantly slower learning rate decay of \(\Omega(1/\sqrt{t})\). The resulting framework not only accommodates non-Lipschitz settings but also markedly improves the alignment between theoretical guarantees and practical training dynamics.
📝 Abstract
Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $η_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $Ω(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations.