On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work investigates the slow convergence of gradient descent in nonlinear neural networks when optimizing for adversarial robustness. Focusing on the training dynamics of a two-neuron ReLU network in binary classification, it establishes the first explicit lower bound on the convergence rate of robust decision boundaries in a nonlinear model, proving that the boundary converges to the optimal robust separator strictly at a Θ(1/ln(t)) rate. By precisely characterizing gradient trajectories across distinct activation patterns, the analysis provides rigorous theoretical control over the evolution of the decision boundary. Both theoretical and empirical results consistently demonstrate that this slow convergence is pervasive across various natural initialization schemes, revealing a fundamental efficiency bottleneck in current optimization approaches for robust learning.

Technology Category

Application Category

📝 Abstract

We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting, consisting of a two-neuron ReLU network and two training instances. We prove that even under these strong simplifying assumptions, while GD successfully converges to an optimal robustness margin, effectively maximizing the distance between the decision boundary and the training points, this convergence occurs at a prohibitively slow rate, scaling strictly as $Θ(1/\ln(t))$. To the best of our knowledge, this establishes the first explicit lower bound on the convergence rate of the robustness margin in a non-linear model. Through empirical simulations, we further demonstrate that this inherent failure mode is pervasive, exhibiting the exact same tight convergence rate across multiple natural network initializations. Our theoretical guarantees are derived via a rigorous analysis of the GD trajectories across the distinct activation patterns of the model. Specifically, we develop tight control over the system's dynamics to bound the trajectory of the decision boundary, overcoming the primary technical challenge introduced by the non-linear nature of the architecture.

Problem

Research questions and friction points this paper is trying to address.

convergence rate

gradient descent

adversarial robustness

non-linear neural networks

robustness margin

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Descent

Convergence Rate

Adversarial Robustness