Modified Loss of Momentum Gradient Descent: Fine-Grained Analysis

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work investigates the continuous-time approximation of gradient descent with Polyak’s heavy-ball (HB) momentum on exponentially attracting invariant manifolds. Using dynamical systems analysis, invariant manifold theory, and asymptotic expansion, we rigorously prove that, for sufficiently small step size (h), HB is equivalent to standard gradient descent optimizing an implicitly modified loss function, with global approximation accuracy of arbitrary order (O(h^R)). Furthermore, we uncover combinatorial structures—including Euler and Narayana polynomials—in the memoryless approximation, enabling the construction of arbitrary-order continuous modified equations and dominant dynamics that uniformly characterize both full-batch and mini-batch regimes. Our results generalize the continuous modeling of HB momentum from first-order to arbitrary-order approximations, establishing a unified theoretical framework for high-fidelity dynamical analysis of momentum-based optimization methods.

Technology Category

Application Category

📝 Abstract

We analyze gradient descent with Polyak heavy-ball momentum (HB) whose fixed momentum parameter $βin (0, 1)$ provides exponential decay of memory. Building on Kovachki and Stuart (2021), we prove that on an exponentially attractive invariant manifold the algorithm is exactly plain gradient descent with a modified loss, provided that the step size $h$ is small enough. Although the modified loss does not admit a closed-form expression, we describe it with arbitrary precision and prove global (finite "time" horizon) approximation bounds $O(h^{R})$ for any finite order $R geq 2$. We then conduct a fine-grained analysis of the combinatorics underlying the memoryless approximations of HB, in particular, finding a rich family of polynomials in $β$ hidden inside which contains Eulerian and Narayana polynomials. We derive continuous modified equations of arbitrary approximation order (with rigorous bounds) and the principal flow that approximates the HB dynamics, generalizing Rosca et al. (2023). Approximation theorems cover both full-batch and mini-batch HB. Our theoretical results shed new light on the main features of gradient descent with heavy-ball momentum, and outline a road-map for similar analysis of other optimization algorithms.

Problem

Research questions and friction points this paper is trying to address.

Analyzing gradient descent with Polyak heavy-ball momentum dynamics

Deriving modified loss approximations with rigorous error bounds

Investigating combinatorial structures in memoryless momentum approximations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modified loss gradient descent on manifold

High-order approximation bounds for step size

Continuous modified equations with rigorous bounds

🔎 Similar Papers

Role of Momentum in Smoothing Objective Function and Generalizability of Deep Neural Networks