Towards hyperparameter-free optimization with differential privacy

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In differentially private (DP) deep learning, manually tuning the learning rate and per-sample gradient clipping threshold incurs high computational overhead and poor generalization. To address this, we propose the first end-to-end, hyperparameter-free DP optimization framework. Our method jointly designs an automatic learning rate scheduler and an adaptive per-sample gradient clipping mechanism; their co-design eliminates data-dependent hyperparameter selection—requiring neither grid search nor validation-set feedback. The framework seamlessly integrates DP-SGD, adaptive optimizers, and dynamic clipping, substantially reducing the privacy–utility trade-off cost. Evaluated on language and vision benchmark tasks under ε ≤ 8, our approach achieves state-of-the-art DP performance while attaining training efficiency nearly matching non-private baselines, with less than a 15% increase in computational overhead.

Technology Category

Application Category

📝 Abstract
Differential privacy (DP) is a privacy-preserving paradigm that protects the training data when training deep learning models. Critically, the performance of models is determined by the training hyperparameters, especially those of the learning rate schedule, thus requiring fine-grained hyperparameter tuning on the data. In practice, it is common to tune the learning rate hyperparameters through the grid search that (1) is computationally expensive as multiple runs are needed, and (2) increases the risk of data leakage as the selection of hyperparameters is data-dependent. In this work, we adapt the automatic learning rate schedule to DP optimization for any models and optimizers, so as to significantly mitigate or even eliminate the cost of hyperparameter tuning when applied together with automatic per-sample gradient clipping. Our hyperparameter-free DP optimization is almost as computationally efficient as the standard non-DP optimization, and achieves state-of-the-art DP performance on various language and vision tasks.
Problem

Research questions and friction points this paper is trying to address.

Eliminates need for hyperparameter tuning in DP optimization
Reduces computational cost and data leakage risk
Achieves state-of-the-art DP performance efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic learning rate schedule adaptation
Hyperparameter-free DP optimization
Automatic per-sample gradient clipping
🔎 Similar Papers
No similar papers found.