Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

📅 2024-06-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This paper addresses the analytical challenge in non-convex optimization arising from directional degeneracy in standard SGD updates, which impedes convergence analysis. We propose Poisson SGD—a novel SGD variant incorporating stochastic learning rates. Methodologically, we pioneer the integration of randomized step sizes with piecewise-deterministic Markov process (PDMP) theory, approximating dynamics via the Bouncy Particle Sampler and establishing rigorous convergence of the parameter distribution to a non-trivial stationary distribution under weak convexity. Theoretically, our work breaks the reliance of existing SGD stationary-distribution analyses on non-degenerate update directions, yielding the first provably convergent framework for stochastic-step-size SGD. It further provides guarantees for global minimum attainment and derives an upper bound on generalization error. Empirical results demonstrate superior optimization robustness compared to SGD with fixed or decaying learning rates.

Technology Category

Application Category

📝 Abstract

We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its simplified variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions on a loss function. Based on this, we further show that Poisson SGD finds global minima in non-convex optimization problems and also evaluate the generalization error using this method. As a proof technique, we approximate the distribution by Poisson SGD with that of the bouncy particle sampler (BPS) and derive its stationary distribution, using the theoretical advance of the piece-wise deterministic Markov process (PDMP).

Problem

Research questions and friction points this paper is trying to address.

Analyzing SGD convergence with random learning rates

Studying non-convex optimization via stationary distributions

Establishing global convergence for Poisson SGD variant

Innovation

Methods, ideas, or system contributions that make the work stand out.

Poisson SGD with random learning rate

Converges to stationary distribution via PDMP

Finds global minima in non-convex optimization

🔎 Similar Papers

Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks