🤖 AI Summary
This work addresses the challenge of convergence in constrained non-convex optimization under heavy-tailed noise by proposing a class of proximal preconditioned stochastic gradient algorithms that extend the Muon and Scion optimizers to accommodate a wide range of convex and non-convex constraints. The key innovations include the first integration of a proximal mechanism into spectral gradient methods for constraint handling, the development of a more realistic nonlinear preconditioning convergence analysis, and the design of a variance-reduced variant to accelerate convergence. Theoretically, the method is guaranteed to converge under both standard and heavy-tailed noise assumptions, with the variance-reduced version substantially improving convergence rates. The analysis provides a more accurate characterization of practical optimization dynamics compared to existing approaches.
📝 Abstract
In this work, we develop proximal preconditioned gradient methods with a focus on spectral gradient methods providing a proximal extension to the Muon and Scion optimizers. We introduce a family of stochastic algorithms that can handle a wide variety of convex and nonconvex constraints and study its convergence under heavy-tailed noise, through a novel analysis tailored to the geometry of the proposed methods. We further propose a variance-reduced version, which achieves faster convergence under standard noise assumptions. Finally, we show that the polynomial iterations used in Muon are more accurately captured by a nonlinear preconditioner than by the ideal matrix sign, leading to a convergence analysis that more faithfully reflects practical implementations.