Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

📅 2023-11-24
🏛️ International Conference on Machine Learning
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual challenges of margin maximization and convergence rate in deep learning classifiers. We propose Progressive Norm Rescaling Gradient Descent (PRGD), a novel optimization algorithm. Theoretically, we show that standard gradient descent (GD) and normalized GD (NGD) achieve only polynomial convergence rates under common data distributions and fundamentally fail in certain regimes; in contrast, PRGD is the first method to attain exponential convergence to the maximum-margin solution for linearly separable data. Our approach stems from a rigorous analysis of the gradient flow velocity field and introduces a dynamic norm-rescaling mechanism. We provide formal convergence guarantees and validate PRGD empirically on both synthetic and real-world datasets. Experiments demonstrate that PRGD drives exponential margin growth on separable data and significantly improves generalization performance on non-separable data and deep neural networks.
📝 Abstract
In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
Problem

Research questions and friction points this paper is trying to address.

Data Classification
Accuracy Improvement
Deep Learning Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

PRGD Algorithm
Efficiency Enhancement
Accuracy Improvement
🔎 Similar Papers
No similar papers found.
Mingze Wang
Mingze Wang
School of Mathematical Sciences, Peking University
Machine Learning TheoryDeep Learning TheoryOptimization
Z
Zeping Min
School of Mathematical Sciences, Peking University, Beijing, China
L
Lei Wu
School of Mathematical Sciences, Peking University, Beijing, China; Center for Machine Learning Research, Peking University, Beijing, China