🤖 AI Summary
This paper addresses the high computational cost of upper-level objective (hypergradient) evaluation in bilevel optimization, particularly when the lower-level problem is a large-scale empirical risk minimization. We propose and analyze a stochastic hypergradient estimation method. Under a weak assumption—that the lower-level update operator is expected-contractive—we establish, for the first time, an iteration complexity bound on the mean-square error of hypergradient estimation, relaxing classical requirements of strong monotonicity or exact lower-level optimization. Theoretically, the method enjoys provable convergence rates; empirically, it demonstrates superior computational efficiency and accuracy. Our key contributions are: (1) significantly weakening critical assumptions to broaden applicability; (2) providing the first stochastic hypergradient complexity analysis grounded in expected contraction; and (3) delivering a scalable, theoretically guaranteed algorithmic framework for large-scale bilevel learning.
📝 Abstract
Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. We provide iteration complexity bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. Preliminary numerical experiments support our theoretical analysis.