Constrained Stochastic Spectral Preconditioning Converges for Nonconvex Objectives

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the challenge of convergence in constrained non-convex optimization under heavy-tailed noise by proposing a class of proximal preconditioned stochastic gradient algorithms that extend the Muon and Scion optimizers to accommodate a wide range of convex and non-convex constraints. The key innovations include the first integration of a proximal mechanism into spectral gradient methods for constraint handling, the development of a more realistic nonlinear preconditioning convergence analysis, and the design of a variance-reduced variant to accelerate convergence. Theoretically, the method is guaranteed to converge under both standard and heavy-tailed noise assumptions, with the variance-reduced version substantially improving convergence rates. The analysis provides a more accurate characterization of practical optimization dynamics compared to existing approaches.
📝 Abstract
In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.
Problem

Research questions and friction points this paper is trying to address.

nonconvex optimization
stochastic gradient methods
spectral preconditioning
heavy-tailed noise
constrained optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

proximal preconditioning
spectral gradient methods
nonconvex constraints
variance reduction
heavy-tailed noise
🔎 Similar Papers