DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation

📅 2025-03-29

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the challenge of balancing privacy and utility in DP-SGD—stemming from manually specified gradient clipping thresholds (C)—and the high computational cost of hyperparameter tuning, this paper proposes a framework that estimates the gradient norm distribution via differentially private histograms and dynamically adapts (C) during training. We introduce two novel mechanisms: DC-SGD-P (quantile-driven) and DC-SGD-E (minimizing expected squared error), the first to enable end-to-end, on-the-fly adaptation of (C) without manual intervention. The approach is compatible with Adam, and we provide theoretical guarantees for ((varepsilon,delta))-differential privacy and bounded convergence. Experiments on CIFAR-10 demonstrate a 10.62% accuracy improvement under identical privacy budgets compared to standard DP-SGD, while reducing hyperparameter search overhead by a factor of 9.

Technology Category

Application Category

📝 Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is a widely adopted technique for privacy-preserving deep learning. A critical challenge in DP-SGD is selecting the optimal clipping threshold C, which involves balancing the trade-off between clipping bias and noise magnitude, incurring substantial privacy and computing overhead during hyperparameter tuning. In this paper, we propose Dynamic Clipping DP-SGD (DC-SGD), a framework that leverages differentially private histograms to estimate gradient norm distributions and dynamically adjust the clipping threshold C. Our framework includes two novel mechanisms: DC-SGD-P and DC-SGD-E. DC-SGD-P adjusts the clipping threshold based on a percentile of gradient norms, while DC-SGD-E minimizes the expected squared error of gradients to optimize C. These dynamic adjustments significantly reduce the burden of hyperparameter tuning C. The extensive experiments on various deep learning tasks, including image classification and natural language processing, show that our proposed dynamic algorithms achieve up to 9 times acceleration on hyperparameter tuning than DP-SGD. And DC-SGD-E can achieve an accuracy improvement of 10.62% on CIFAR10 than DP-SGD under the same privacy budget of hyperparameter tuning. We conduct rigorous theoretical privacy and convergence analyses, showing that our methods seamlessly integrate with the Adam optimizer. Our results highlight the robust performance and efficiency of DC-SGD, offering a practical solution for differentially private deep learning with reduced computational overhead and enhanced privacy guarantees.

Problem

Research questions and friction points this paper is trying to address.

Optimizing clipping threshold in DP-SGD for privacy

Reducing hyperparameter tuning overhead in private learning

Balancing gradient clipping bias and noise magnitude

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic clipping via gradient norm estimation

Private histograms for threshold adjustment

Optimizes clipping to reduce tuning overhead

🔎 Similar Papers

DP-SGD with weight clipping