On Optimal Hyperparameters for Differentially Private Deep Transfer Learning

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper addresses the hyperparameter selection challenge for clipping bound $C$ and batch size $B$ in differentially private deep transfer learning: while theory prescribes smaller $C$ under strong privacy, empirical evidence shows larger $C$ yields superior utility; existing heuristic tuning methods fail under limited computational budgets, and fixed $(C,B)$ across tasks degrades performance significantly. We identify dynamic gradient distribution shifts as the root cause of the theory-practice gap and introduce “cumulative privacy noise” as a key metric characterizing batch-level effects. Leveraging gradient analysis and noise modeling, we systematically evaluate hyperparameter combinations under fixed computational budgets. Experiments confirm the superiority of larger $C$ under strong privacy, establish a novel principled criterion for selecting $B$, and substantially improve the privacy–utility trade-off.

Technology Category

Application Category

📝 Abstract

Differentially private (DP) transfer learning, i.e., fine-tuning a pretrained model on private data, is the current state-of-the-art approach for training large models under privacy constraints. We focus on two key hyperparameters in this setting: the clipping bound $C$ and batch size $B$. We show a clear mismatch between the current theoretical understanding of how to choose an optimal $C$ (stronger privacy requires smaller $C$) and empirical outcomes (larger $C$ performs better under strong privacy), caused by changes in the gradient distributions. Assuming a limited compute budget (fixed epochs), we demonstrate that the existing heuristics for tuning $B$ do not work, while cumulative DP noise better explains whether smaller or larger batches perform better. We also highlight how the common practice of using a single $(C,B)$ setting across tasks can lead to suboptimal performance. We find that performance drops especially when moving between loose and tight privacy and between plentiful and limited compute, which we explain by analyzing clipping as a form of gradient re-weighting and examining cumulative DP noise.

Problem

Research questions and friction points this paper is trying to address.

Analyzing mismatch between theoretical and empirical optimal clipping bounds in DP transfer learning

Investigating how cumulative DP noise determines optimal batch size selection

Identifying suboptimal performance from using fixed hyperparameters across different privacy-compute regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Larger clipping bound improves strong privacy performance

Cumulative DP noise determines optimal batch size

Task-specific hyperparameters prevent suboptimal privacy-compute tradeoffs

🔎 Similar Papers

No similar papers found.