🤖 AI Summary
Existing decentralized stochastic bilevel optimization (SBO) algorithms focus on asymptotic convergence rates while neglecting transient iteration complexity, thus failing to characterize the joint impact of network topology and data heterogeneity. To address this, we propose D-SOBA—the first inner-loop-free decentralized SBO algorithm—featuring single-loop stochastic gradient estimation, momentum-assisted coordinated updates of upper- and lower-level variables, and a graph Laplacian–constrained consensus mechanism enabling fully distributed communication. We establish the first transient complexity theory for decentralized SBO, achieving the optimal $O(1/varepsilon^2)$ transient iteration complexity and asymptotically optimal gradient/Hessian query complexity under mild assumptions. Experiments demonstrate that D-SOBA significantly outperforms state-of-the-art methods across heterogeneous data distributions and diverse network topologies, exhibiting both high efficiency and robustness.
📝 Abstract
Stochastic bilevel optimization (SBO) is becoming increasingly essential in machine learning due to its versatility in handling nested structures. To address large-scale SBO, decentralized approaches have emerged as effective paradigms in which nodes communicate with immediate neighbors without a central server, thereby improving communication efficiency and enhancing algorithmic robustness. However, current decentralized SBO algorithms face challenges, including expensive inner-loop updates and unclear understanding of the influence of network topology, data heterogeneity, and the nested bilevel algorithmic structures. In this paper, we introduce a single-loop decentralized SBO (D-SOBA) algorithm and establish its transient iteration complexity, which, for the first time, clarifies the joint influence of network topology and data heterogeneity on decentralized bilevel algorithms. D-SOBA achieves the state-of-the-art asymptotic rate, asymptotic gradient/Hessian complexity, and transient iteration complexity under more relaxed assumptions compared to existing methods. Numerical experiments validate our theoretical findings.