🤖 AI Summary
This work studies decentralized stochastic bilevel optimization (DSBO), where multiple agents collaboratively solve a bilevel problem over a communication graph without a central coordinator. To address this, we propose the first purely first-order decentralized SGDA-GT algorithm, which integrates stochastic gradient descent/ascent with gradient tracking (GT) and completely avoids second-order computations. Theoretically, we establish an $mathcal{O}(n^{-1}varepsilon^{-7})$ sample complexity, achieving linear speedup in the number of agents $n$ and matching the optimal convergence rate of centralized single-machine methods. Empirically, our approach significantly outperforms existing decentralized bilevel optimization algorithms in both communication efficiency and training speed, while retaining strong theoretical guarantees and practical scalability.
📝 Abstract
This paper focuses on decentralized stochastic bilevel optimization (DSBO) where agents only communicate with their neighbors. We propose Decentralized Stochastic Gradient Descent and Ascent with Gradient Tracking (DSGDA-GT), a novel algorithm that only requires first-order oracles that are much cheaper than second-order oracles widely adopted in existing works. We further provide a finite-time convergence analysis showing that for $n$ agents collaboratively solving the DSBO problem, the sample complexity of finding an $epsilon$-stationary point in our algorithm is $mathcal{O}(n^{-1}epsilon^{-7})$, which matches the currently best-known results of the single-agent counterpart with linear speedup. The numerical experiments demonstrate both the communication and training efficiency of our algorithm.