Bilevel Learning with Inexact Stochastic Gradients

📅 2024-12-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
We address large-scale stochastic bilevel optimization problems where the upper-level objective is nonconvex and the lower-level objective is strongly convex. To tackle the challenge of imprecise stochastic hypergradient estimation induced by data sampling, we propose an adaptive stochastic bilevel learning framework. Methodologically, our approach integrates implicit function differentiation approximation with stochastic hypergradient estimation, and—under mild assumptions that dispense with fixed inner-loop iteration counts or stringent variance constraints—we establish, for the first time, a convergence theory for nonconvex stochastic bilevel optimization. Our key contributions are: (1) an adaptive, hyperparameter-lightweight update mechanism; and (2) a theoretical bridge linking inexact stochastic hypergradients to nonconvex stochastic optimization. Empirically, on image denoising and deblurring tasks, our method achieves significantly improved training efficiency and superior generalization performance compared to adaptive deterministic bilevel approaches.

Technology Category

Application Category

📝 Abstract
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses inefficiency in bilevel learning methods
Focuses on inexact stochastic gradients for large-scale problems
Improves convergence and generalization in imaging tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inexact stochastic gradients for bilevel learning
Strongly convex lower-level with nonconvex upper-level
Convergence proven under mild assumptions
M
Mohammad Sadegh Salehi
Department of Mathematical Sciences, University of Bath, Bath, BA2 7AY, UK
Subhadip Mukherjee
Subhadip Mukherjee
Assistant Professor, Department of E&ECE, IIT Kharagpur, India
Machine LearningInverse Problems in ImagingOptimization
Lindon Roberts
Lindon Roberts
School of Mathematics and Statistics, University of Sydney, Camperdown NSW 2006, Australia
M
Matthias Joachim Ehrhardt
Department of Mathematical Sciences, University of Bath, Bath, BA2 7AY, UK