🤖 AI Summary
This work addresses the dual challenges of margin maximization and convergence rate in deep learning classifiers. We propose Progressive Norm Rescaling Gradient Descent (PRGD), a novel optimization algorithm. Theoretically, we show that standard gradient descent (GD) and normalized GD (NGD) achieve only polynomial convergence rates under common data distributions and fundamentally fail in certain regimes; in contrast, PRGD is the first method to attain exponential convergence to the maximum-margin solution for linearly separable data. Our approach stems from a rigorous analysis of the gradient flow velocity field and introduces a dynamic norm-rescaling mechanism. We provide formal convergence guarantees and validate PRGD empirically on both synthetic and real-world datasets. Experiments demonstrate that PRGD drives exponential margin growth on separable data and significantly improves generalization performance on non-separable data and deep neural networks.
📝 Abstract
In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.