On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work investigates the convergence of stochastic gradient descent for composite nonconvex optimization problems involving N sequential operators, under perturbations in both forward and backward passes. By modeling the propagation and cascading amplification of perturbations through the computational graph, the study provides the first systematic characterization of the coupling effect between forward and backward perturbations within gradient steps, revealing the theoretical mechanism underlying gradient spiking phenomena and establishing precise conditions for algorithmic convergence or divergence. Rigorous convergence guarantees are derived under both the Polyak–Łojasiewicz condition and general nonconvex settings. Experimental results corroborate the theoretical analysis, demonstrating the asymmetric sensitivity of gradients to the two types of perturbations and consistently reproducing these findings in regularized logistic regression tasks.

Technology Category

Application Category

📝 Abstract

We study stochastic gradient descent (SGD) for composite optimization problems with $N$ sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive and localized, perturbations to intermediate outputs and gradients cascade through the computational graph, compounding geometrically with the number of operators. We present the first comprehensive theoretical analysis of this setting. Specifically, we characterize how forward and backward perturbations propagate and amplify within a single gradient step, derive convergence guarantees for both general non-convex objectives and functions satisfying the Polyak--\L{}ojasiewicz condition, and identify conditions under which perturbations do not deteriorate the asymptotic convergence order. As a byproduct, our analysis furnishes a theoretical explanation for the gradient spiking phenomenon widely observed in deep learning, precisely characterizing the conditions under which training recovers from spikes or diverges. Experiments on logistic regression with convex and non-convex regularization validate our theories, illustrating the predicted spike behavior and the asymmetric sensitivity to forward versus backward perturbations.

Problem

Research questions and friction points this paper is trying to address.

stochastic gradient descent

perturbed forward-backward passes

composite optimization

convergence analysis

gradient spiking

Innovation

Methods, ideas, or system contributions that make the work stand out.

stochastic gradient descent

perturbed forward-backward passes

convergence analysis