High-Probability Bounds for SGD under the Polyak-Lojasiewicz Condition with Markovian Noise

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of high-probability convergence guarantees for stochastic gradient descent (SGD) under Markovian and martingale difference noise within a unified timescale, even when the objective satisfies the Polyak–Łojasiewicz (PL) condition. The paper establishes, for the first time, high-probability convergence bounds for SGD with Markov noise under the PL condition, allowing the noise magnitude to scale with the function value—a setting relevant to decentralized optimization, privacy-amplified sampling, and online system identification. By characterizing the Markov noise via the Poisson equation and employing a probabilistic induction argument to circumvent the absence of almost-sure bounds on the objective, the authors prove that the expected suboptimality decays at the optimal rate of $1/k$. The theoretical findings are validated through experiments on token-based decentralized linear regression, privacy-amplified supervised learning, and online system identification.

Technology Category

Application Category

📝 Abstract
We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
Problem

Research questions and friction points this paper is trying to address.

SGD
Polyak-Lojasiewicz condition
Markovian noise
high-probability bound
stochastic optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Polyak-Lojasiewicz condition
Markovian noise
high-probability bound
stochastic gradient descent
Poisson equation