π€ AI Summary
Fixed-threshold gradient clipping in differentially private stochastic gradient descent (DP-SGD) suffers from poor hyperparameter tuning and weak robustness. Method: We systematically analyze adaptive quantile-based clipping (QC-SGD) and its differentially private variant (DP-QC-SGD), establishing the first rigorous convergence theory for both under non-convex, smooth objectives. We quantify the coupling among quantile selection, step-size scheduling, and gradient bias mitigation, and propose a bias-corrected step-size strategy to substantially alleviate systematic bias induced by quantile estimation. Contribution/Results: Our work fills a critical theoretical gap in adaptive clipping methods, providing the first verifiable convergence guarantee and practical parameter guidance for the widely adopted heuristic quantile clipping in industry. This enables efficient and robust differentially private model training.
π Abstract
Stochastic Gradient Descent (SGD) with gradient clipping is a powerful technique for enabling differentially private optimization. Although prior works extensively investigated clipping with a constant threshold, private training remains highly sensitive to threshold selection, which can be expensive or even infeasible to tune. This sensitivity motivates the development of adaptive approaches, such as quantile clipping, which have demonstrated empirical success but lack a solid theoretical understanding. This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD). We demonstrate that QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but show how this can be mitigated through a carefully designed quantile and step size schedule. Our analysis reveals crucial relationships between quantile selection, step size, and convergence behavior, providing practical guidelines for parameter selection. We extend these results to differentially private optimization, establishing the first theoretical guarantees for DP-QC-SGD. Our findings provide theoretical foundations for widely used adaptive clipping heuristic and highlight open avenues for future research.