Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

πŸ“… 2026-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the problem of achieving optimal regret bounds in adversarial online learning with implicitly convex lossesβ€”non-convex losses that become convex only after a nonlinear reparameterization. By conducting a refined analysis of discrete-time algorithmic equivalence, the study establishes, for the first time in this setting, an optimal $O(\sqrt{T})$ regret bound. The key contributions include introducing a necessary and sufficient Hessian compatibility condition that precisely characterizes the geometric structure required for algorithmic equivalence and demonstrating its indispensability. Furthermore, the results are extended to the single-point bandit feedback setting, yielding an $O(T^{3/4})$ expected regret bound. In both full-information and bandit scenarios, the achieved rates match the classical optimal bounds known for their convex counterparts.
πŸ“ Abstract
We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $Θ(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $Ω(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.
Problem

Research questions and friction points this paper is trying to address.

online learning
hidden-convex losses
regret minimization
algorithmic equivalence
bandit feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

hidden-convex losses
algorithmic equivalence
optimal regret
Hessian compatibility
bandit feedback
πŸ”Ž Similar Papers