Fully First-Order Methods for Decentralized Bilevel Optimization

📅 2024-10-25
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies decentralized stochastic bilevel optimization (DSBO), where multiple agents collaboratively solve a bilevel problem over a communication graph without a central coordinator. To address this, we propose the first purely first-order decentralized SGDA-GT algorithm, which integrates stochastic gradient descent/ascent with gradient tracking (GT) and completely avoids second-order computations. Theoretically, we establish an $mathcal{O}(n^{-1}varepsilon^{-7})$ sample complexity, achieving linear speedup in the number of agents $n$ and matching the optimal convergence rate of centralized single-machine methods. Empirically, our approach significantly outperforms existing decentralized bilevel optimization algorithms in both communication efficiency and training speed, while retaining strong theoretical guarantees and practical scalability.

Technology Category

Application Category

📝 Abstract
This paper focuses on decentralized stochastic bilevel optimization (DSBO) where agents only communicate with their neighbors. We propose Decentralized Stochastic Gradient Descent and Ascent with Gradient Tracking (DSGDA-GT), a novel algorithm that only requires first-order oracles that are much cheaper than second-order oracles widely adopted in existing works. We further provide a finite-time convergence analysis showing that for $n$ agents collaboratively solving the DSBO problem, the sample complexity of finding an $epsilon$-stationary point in our algorithm is $mathcal{O}(n^{-1}epsilon^{-7})$, which matches the currently best-known results of the single-agent counterpart with linear speedup. The numerical experiments demonstrate both the communication and training efficiency of our algorithm.
Problem

Research questions and friction points this paper is trying to address.

Decentralized stochastic bilevel optimization with neighbor communication.
Proposes DSGDA-GT, a first-order algorithm avoiding costly second-order oracles.
Achieves linear speedup with sample complexity matching single-agent best results.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Stochastic Gradient Descent and Ascent
Gradient Tracking for efficient communication
First-order oracles reduce computational cost
X
Xiaoyu Wang
The Hong Kong University of Science and Technology
Xuxing Chen
Xuxing Chen
Meta
OptimizationMachine learningApplied Math
Shiqian Ma
Shiqian Ma
Rice University
OptimizationMachine LearningLLM
T
Tong Zhang
University of Illinois Urbana-Champaign