Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

Existing high-probability convergence analyses of differentially private (DP) clipped SGD rely on *increasing* clipping thresholds, rendering them incompatible with standard DP mechanisms such as the Gaussian mechanism. This work establishes, for the first time, a high-probability convergence theory for DP clipped SGD under *fixed* clipping thresholds and heavy-tailed noise—specifically, noise with bounded centered α-th moment for α ∈ (1, 2]. The analysis unifies convex and non-convex optimization settings. Methodologically, it breaks from the conventional reliance on monotonicity of clipping thresholds, instead leveraging refined moment control and concentration inequalities to derive tighter privacy–utility trade-offs. Theoretically, the algorithm converges at an improved rate to an optimal neighborhood whose radius balances the scale of the privacy-preserving noise. This yields significantly enhanced synergy between convergence speed and privacy protection.

Technology Category

Application Category

📝 Abstract

Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central $α$-th moment assumption, $αin (1,2]$. Our results show that, with a fixed clipping level, the method converges to a neighborhood of the optimal solution with a faster rate than the existing ones. The neighborhood can be balanced against the noise introduced by DP, providing a refined trade-off between convergence speed and privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

Analyzing high-probability convergence of DP-Clipped-SGD with fixed clipping

Bridging gap between gradient clipping and differential privacy mechanisms

Optimizing trade-off between convergence speed and privacy guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

DP-Clipped-SGD with fixed clipping level

High-probability convergence under heavy-tailed noise

Balanced trade-off between speed and privacy

🔎 Similar Papers

DP-SGD with weight clipping