🤖 AI Summary
This work addresses the long-standing open problem of convergence and sample complexity in stochastic bilevel optimization: Can the optimal $O(1/varepsilon^2)$ sample complexity—matching that of single-level stochastic optimization—be achieved? To this end, we propose PnPBO, a plug-and-play single-loop framework that flexibly incorporates diverse stochastic gradient estimators (e.g., PAGE, ZeroSARAH) and introduces upper-level variable moving averaging for enhanced stability. For the first time, we establish a unified convergence analysis accommodating both biased and unbiased gradient estimators, rigorously proving convergence to an $varepsilon$-stationary point with optimal $O(1/varepsilon^2)$ sample complexity. Empirical evaluations on hyperparameter optimization and data distillation demonstrate that PnPBO significantly outperforms existing methods, thereby resolving the fundamental question of whether bilevel optimization can achieve sample efficiency on par with its single-level counterpart.
📝 Abstract
Bilevel optimization has recently attracted significant attention in machine learning due to its wide range of applications and advanced hierarchical optimization capabilities. In this paper, we propose a plug-and-play framework, named PnPBO, for developing and analyzing stochastic bilevel optimization methods. This framework integrates both modern unbiased and biased stochastic estimators into the single-loop bilevel optimization framework introduced in [9], with several improvements. In the implementation of PnPBO, all stochastic estimators for different variables can be independently incorporated, and an additional moving average technique is applied when using an unbiased estimator for the upper-level variable. In the theoretical analysis, we provide a unified convergence and complexity analysis for PnPBO, demonstrating that the adaptation of various stochastic estimators (including PAGE, ZeroSARAH, and mixed strategies) within the PnPBO framework achieves optimal sample complexity, comparable to that of single-level optimization. This resolves the open question of whether the optimal complexity bounds for solving bilevel optimization are identical to those for single-level optimization. Finally, we empirically validate our framework, demonstrating its effectiveness on several benchmark problems and confirming our theoretical findings.