🤖 AI Summary
This work addresses the challenge in bilevel optimization where non-isolated minima manifolds in the lower-level problem render the upper-level objective nondifferentiable. To overcome this, the authors propose a “select-then-differentiate” framework that, under a local Polyak–Łojasiewicz condition, introduces a unique optimistic selection to ensure hypergradient differentiability and enables explicit hypergradient computation via the pseudoinverse. Theoretically, global uniqueness of the lower-level solution is unnecessary; local smoothness of the upper-level objective is guaranteed as long as the selected solution is nondegenerate on the manifold. The proposed HG-MS method converges to stationary points of the optimistic objective, with complexity governed by the intrinsic dimension of the manifold. Empirically, it significantly outperforms existing approaches on LLM source reweighting tasks, achieving state-of-the-art results on GSM8K and MATH benchmarks and leading performance on MT-Bench.
📝 Abstract
We study optimistic bilevel optimization when the lower-level problem has a non-isolated manifold of minimizers. In this setting, the hyper-objective may be non-differentiable because the upper-level criterion must choose among multiple lower-level solutions. Under a local Polyak--Łojasiewicz (PŁ) condition, we show that differentiability does not require the lower-level solution set to be a singleton: uniqueness of the optimistic selection is sufficient. This yields an explicit pseudoinverse-based hyper-gradient formula extending the classical singleton-minimizer result. We further characterize the regularity of the hyper-objective: non-degeneracy of the selected minimizer along the solution manifold yields local smoothness, while failure of uniqueness can create many non-differentiable points and failure of non-degeneracy can destroy all positive Hölder regularity of the hyper-gradient. Motivated by this theory, we propose HG-MS, a select-then-differentiate method combining explicit optimistic selection with efficient pseudoinverse-based hyper-gradient computation. Despite the nonconvex nature of optimistic selection over the lower-level solution manifold, we show that HG-MS converges to a stationary point of the optimistic objective with complexity governed by the intrinsic dimension of the solution manifold rather than its ambient dimension. Empirically, we test a practical variant of HG-MS for matched-budget LLM source reweighting. This variant preserves the select-then-differentiate principle and obtains the best GSM8K/MATH scores across the tested backbones, along with competitive or best MT-Bench instruction-following results.