🤖 AI Summary
To address the accuracy degradation caused by gradient perturbation in differentially private federated learning (DPFL), this paper introduces ImageNet-1K pre-trained models into the DPFL framework for the first time and systematically compares head tuning (HT) versus full fine-tuning (FT) for privacy–utility trade-off optimization. Experiments on CIFAR-10, CheXpert-MNIST (CHMNIST), and Fashion-MNIST demonstrate that HT consistently outperforms FT—achieving up to an 8.2% absolute accuracy gain—especially under tight privacy budgets (e.g., ε ≤ 4) or with larger models. The key mechanism lies in reducing the number of gradient exposures per training round, thereby mitigating cumulative noise amplification. We propose HT as a novel, efficient fine-tuning paradigm for DPFL and support it with theoretical gradient noise analysis and visualization-based interpretability. Our approach yields a reproducible, scalable solution that jointly optimizes privacy guarantees and model utility without architectural modifications or additional hyperparameters.
📝 Abstract
Pre-training exploits public datasets to pre-train an advanced machine learning model, so that the model can be easily tuned to adapt to various downstream tasks. Pre-training has been extensively explored to mitigate computation and communication resource consumption. Inspired by these advantages, we are the first to explore how model pre-training can mitigate noise detriment in differentially private federated learning (DPFL). DPFL is upgraded from federated learning (FL), the de-facto standard for privacy preservation when training the model across multiple clients owning private data. DPFL introduces differentially private (DP) noises to obfuscate model gradients exposed in FL, which however can considerably impair model accuracy. In our work, we compare head fine-tuning (HT) and full fine-tuning (FT), which are based on pre-training, with scratch training (ST) in DPFL through a comprehensive empirical study. Our experiments tune pre-trained models (obtained by pre-training on ImageNet-1K) with CIFAR-10, CHMNIST and Fashion-MNIST (FMNIST) datasets, respectively. The results demonstrate that HT and FT can significantly mitigate noise influence by diminishing gradient exposure times. In particular, HT outperforms FT when the privacy budget is tight or the model size is large. Visualization and explanation study further substantiates our findings. Our pioneering study introduces a new perspective on enhancing DPFL and expanding its practical applications.