🤖 AI Summary
We address large-scale stochastic bilevel optimization problems where the upper-level objective is nonconvex and the lower-level objective is strongly convex. To tackle the challenge of imprecise stochastic hypergradient estimation induced by data sampling, we propose an adaptive stochastic bilevel learning framework. Methodologically, our approach integrates implicit function differentiation approximation with stochastic hypergradient estimation, and—under mild assumptions that dispense with fixed inner-loop iteration counts or stringent variance constraints—we establish, for the first time, a convergence theory for nonconvex stochastic bilevel optimization. Our key contributions are: (1) an adaptive, hyperparameter-lightweight update mechanism; and (2) a theoretical bridge linking inexact stochastic hypergradients to nonconvex stochastic optimization. Empirically, on image denoising and deblurring tasks, our method achieves significantly improved training efficiency and superior generalization performance compared to adaptive deterministic bilevel approaches.
📝 Abstract
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.