🤖 AI Summary
This work addresses the unclear stability and generalization behavior of Push-Sum–type decentralized algorithms over directed graphs, where topological asymmetry introduces bias and complicates performance analysis. The authors develop a unified uniform stability framework that disentangles statistical error from topology-induced bias. They introduce an imbalance-aware consistency bound that, for the first time, decouples the topological imbalance parameter δ from the spectral gap (1−λ), revealing their joint influence on learning performance and clarifying the necessity of Push-Sum correction. By integrating uniform stability theory, Markov chain stationary distributions, and spectral graph theory, they derive a generalization error bound of Õ(1/√(mn) + γ/(δ(1−λ)) + γ) and identify the optimal early stopping time under convex objectives. In the non-convex setting satisfying the Polyak–Łojasiewicz condition, they establish optimization and generalization rates dependent on κ(1+1/(δ(1−λ))), quantifying how topology constrains algorithmic performance.
📝 Abstract
Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $\delta$ and the spectral gap $(1-\lambda)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--\L{}ojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\frac{\gamma}{\delta(1-\lambda)}+\gamma\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For P\L{} objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $\kappa\!\left(1+\frac{1}{\delta(1-\lambda)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.