On the Complexity of Decentralized Smooth Nonconvex Finite-Sum Optimization

📅 2022-10-25

📈 Citations: 8

✨ Influential: 2

career value

243K/year

🤖 AI Summary

This paper studies decentralized smooth nonconvex finite-sum optimization—minimizing the average of local functions across agents in a network without a central coordinator, where each agent’s local function comprises multiple smooth nonconvex component functions. We propose DEAREST, a novel stochastic decentralized algorithm. First, we introduce two key smoothness parameters: the global Lipschitz constant (L) and the mean-square Lipschitz constant (ar{L}), enabling tighter complexity bounds. Second, we establish matching lower bounds, proving near-optimality of DEAREST in communication, computation, and gradient oracle complexities. Third, we extend the analysis to the Polyak–Łojasiewicz (PL) setting, achieving accelerated convergence. To reach an (varepsilon)-stationary point, DEAREST requires ( ilde{O}(Lvarepsilon^{-2}/sqrt{gamma})) communication rounds, ( ilde{O}(n + (L + min{nL,,sqrt{n/m},ar{L}})varepsilon^{-2})) computation rounds, and ( ilde{O}(mn + min{mnL,,sqrt{mn},ar{L}}varepsilon^{-2})) local first-order oracle calls, where (gamma) denotes the spectral gap of the mixing matrix.

📝 Abstract

We study the decentralized optimization problem $min_{{f x}in{mathbb R}^d} f({f x}) riangleq frac{1}{m}sum_{i=1}^m f_i({f x})$, where the local function on the $i$-th agent has the form of $f_i({f x}) riangleq frac{1}{n}sum_{j=1}^n f_{i,j}({f x})$ and every individual $f_{i,j}$ is smooth but possibly nonconvex. We propose a stochastic algorithm called DEcentralized probAbilistic Recursive gradiEnt deScenT (DEAREST) method, which achieves an $epsilon$-stationary point at each agent with the communication rounds of $ ilde{mathcal O}(Lepsilon^{-2}/sqrt{gamma},)$, the computation rounds of $ ilde{mathcal O}(n+(L+min{nL, sqrt{n/m}ar L})epsilon^{-2})$, and the local incremental first-oracle calls of ${mathcal O}(mn + {min{mnL, sqrt{mn}ar L}}{epsilon^{-2}})$, where $L$ is the smoothness parameter of the objective function, $ar L$ is the mean-squared smoothness parameter of all individual functions, and $gamma$ is the spectral gap of the mixing matrix associated with the network. We then establish the lower bounds to show that the proposed method is near-optimal. Notice that the smoothness parameters $L$ and $ar L$ used in our algorithm design and analysis are global, leading to sharper complexity bounds than existing results that depend on the local smoothness. We further extend DEAREST to solve the decentralized finite-sum optimization problem under the Polyak-{L}ojasiewicz condition, also achieving the near-optimal complexity bounds.

Problem

Research questions and friction points this paper is trying to address.

Decentralized optimization of nonconvex finite-sum functions

Achieving near-optimal communication and computation complexity

Extending method to Polyak-Łojasiewicz condition problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized stochastic gradient descent algorithm

Near-optimal communication and computation complexity

Global smoothness parameters for sharper bounds

🔎 Similar Papers

No similar papers found.