Convergence Properties of Stochastic Hypergradients

📅 2020-11-13

🏛️ International Conference on Artificial Intelligence and Statistics

📈 Citations: 26

✨ Influential: 1

career value

222K/year

🤖 AI Summary

This paper addresses the high computational cost of upper-level objective (hypergradient) evaluation in bilevel optimization, particularly when the lower-level problem is a large-scale empirical risk minimization. We propose and analyze a stochastic hypergradient estimation method. Under a weak assumption—that the lower-level update operator is expected-contractive—we establish, for the first time, an iteration complexity bound on the mean-square error of hypergradient estimation, relaxing classical requirements of strong monotonicity or exact lower-level optimization. Theoretically, the method enjoys provable convergence rates; empirically, it demonstrates superior computational efficiency and accuracy. Our key contributions are: (1) significantly weakening critical assumptions to broaden applicability; (2) providing the first stochastic hypergradient complexity analysis grounded in expected contraction; and (3) delivering a scalable, theoretically guaranteed algorithmic framework for large-scale bilevel learning.

📝 Abstract

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems in the design of optimization algorithms for bilevel optimization is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. We provide iteration complexity bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. Preliminary numerical experiments support our theoretical analysis.

Problem

Research questions and friction points this paper is trying to address.

Studying stochastic hypergradient approximation for bilevel optimization

Analyzing mean square error bounds for hypergradient estimates

Validating stochastic hypergradients via numerical experiments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic hypergradient approximation for bilevel optimization

Bounds for mean square error in hypergradient approximation

Stochastic variant of implicit differentiation approach

🔎 Similar Papers

Convergence of SGD with momentum in the nonconvex case: A novel time window-based analysis