Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies the complexity of computing an ε-stationary point in stochastic bilevel optimization, where the upper-level objective is nonconvex and the lower-level objective is strongly convex. For problems with p-th-order smoothness, we propose the F²SA-p family of algorithms, which estimates the hypergradient via p-th-order finite differences reformulated in forward-difference form and leverages p-th-order smoothness to establish convergence. We derive the first improved upper bound of Õ(ε^{4−p/2}) on the sample complexity; specifically, for p = 4, the bound becomes Õ(ε⁻²), substantially outperforming the prior best-known rate of O(ε⁻⁴). Moreover, we prove a matching Ω(ε⁻⁴) lower bound, demonstrating that F²SA-p achieves near-optimal complexity in highly smooth regimes. This is the first work to reveal that higher-order smoothness can fundamentally accelerate stochastic bilevel optimization, establishing a new algorithmic paradigm for designing efficient solvers.

Technology Category

Application Category

📝 Abstract
This paper studies the complexity of finding an $ε$-stationary point for stochastic bilevel optimization when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent work proposed the first-order method, F${}^2$SA, achieving the $ ilde{mathcal{O}}(ε^{-6})$ upper complexity bound for first-order smooth problems. This is slower than the optimal $Ω(ε^{-4})$ complexity lower bound in its single-level counterpart. In this work, we show that faster rates are achievable for higher-order smooth problems. We first reformulate F$^2$SA as approximating the hyper-gradient with a forward difference. Based on this observation, we propose a class of methods F${}^2$SA-$p$ that uses $p$th-order finite difference for hyper-gradient approximation and improves the upper bound to $ ilde{mathcal{O}}(p ε^{4-p/2})$ for $p$th-order smooth problems. Finally, we demonstrate that the $Ω(ε^{-4})$ lower bound also holds for stochastic bilevel problems when the high-order smoothness holds for the lower-level variable, indicating that the upper bound of F${}^2$SA-$p$ is nearly optimal in the highly smooth region $p = Ω( log ε^{-1} / log log ε^{-1})$.
Problem

Research questions and friction points this paper is trying to address.

Improving complexity bounds for stochastic bilevel optimization
Achieving faster rates with higher-order smoothness assumptions
Developing nearly optimal methods for hyper-gradient approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pth-order finite difference approximation
Improves complexity bound for smooth problems
Achieves near-optimal rates in highly smooth regions
🔎 Similar Papers
No similar papers found.