🤖 AI Summary
This work addresses the computational burden of quadratic projections induced by operator-norm constraints in online matrix optimization, which plagues existing adaptive algorithms such as Shampoo. By extending the gradient prediction framework to adaptive matrix online learning, the authors introduce the notion of “admissible smoothing” and construct a family of smooth potential functions based on the nuclear norm. This leads to efficient algorithms with closed-form updates that eliminate the need for expensive projections. Specifically, the proposed methods—adaptive Follow-the-Perturbed-Leader (FTPL) with Gaussian random smoothing and Follow-the-Adaptive-Mirror-Leader (FAML) with overdetermined hyperbolic smoothing—achieve significantly lower computational overhead than Shampoo while matching its optimal regret bound up to constant factors. Moreover, this work provides the first convergence guarantees for matrix variants of the Pion and Leon optimizers in nonsmooth nonconvex settings.
📝 Abstract
We study online linear optimization with matrix variables constrained by the operator norm, a setting where the geometry renders designing data-dependent and efficient adaptive algorithms challenging. The best-known adaptive regret bounds are achieved by Shampoo-like methods, but they require solving a costly quadratic projection subproblem. To address this, we extend the gradient-based prediction scheme to adaptive matrix online learning and cast algorithm design as constructing a family of smoothed potentials for the nuclear norm. We define a notion of admissibility for such smoothings and prove any admissible smoothing yields a regret bound matching the best-known guarantees of one-sided Shampoo. We instantiate this framework with two efficient methods that avoid quadratic projections. The first is an adaptive Follow-the-Perturbed-Leader (FTPL) method using Gaussian stochastic smoothing. The second is Follow-the-Augmented-Matrix-Leader (FAML), which uses a deterministic hyperbolic smoothing in an augmented matrix space. By analyzing the admissibility of these smoothings, we show both methods admit closed-form updates and match one-sided Shampoo's regret up to a constant factor, while significantly reducing computational cost. Lastly, using the online-to-nonconvex conversion, we derive two matrix-based optimizers, Pion (from FTPL) and Leon (from FAML). We prove convergence guarantees for these methods in nonsmooth nonconvex settings, a guarantee that the popular Muon optimizer lacks.