Differentially Private Prototypes for Imbalanced Transfer Learning

📅 2024-06-12

📈 Citations: 1

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the performance degradation of DP-SGD under high privacy (ε ≤ 1), limited private data, and class imbalance, this paper proposes Differential Private Prototype Learning (DPPL), a novel paradigm. DPPL introduces a non-iterative noise injection mechanism coupled with public-data-guided private prototype sampling—generating ε-differentially private class prototypes (ε ≤ 2.0) from only a few private samples, without perturbing gradients or model parameters. It integrates pretrained encoder-based feature extraction with public-data-augmented prototype construction, ensuring strong privacy guarantees (δ = 10⁻⁵) while substantially improving generalization. Extensive evaluation across four vision benchmarks, four state-of-the-art encoders, and multiple class-imbalanced settings demonstrates that DPPL consistently outperforms DP-SGD and other baselines across ε ∈ [0.5, 2.0]. Notably, under extreme data scarcity (<10 samples per class), DPPL achieves up to 12.7% absolute accuracy gain.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) models have been shown to leak private information from their training datasets. Differential Privacy (DP), typically implemented through the differential private stochastic gradient descent algorithm (DP-SGD), has become the standard solution to bound leakage from the models. Despite recent improvements, DP-SGD-based approaches for private learning still usually struggle in the high privacy ($varepsilonle1)$ and low data regimes, and when the private training datasets are imbalanced. To overcome these limitations, we propose Differentially Private Prototype Learning (DPPL) as a new paradigm for private transfer learning. DPPL leverages publicly pre-trained encoders to extract features from private data and generates DP prototypes that represent each private class in the embedding space and can be publicly released for inference. Since our DP prototypes can be obtained from only a few private training data points and without iterative noise addition, they offer high-utility predictions and strong privacy guarantees even under the notion of extit{pure DP}. We additionally show that privacy-utility trade-offs can be further improved when leveraging the public data beyond pre-training of the encoder: in particular, we can privately sample our DP prototypes from the publicly available data points used to train the encoder. Our experimental evaluation with four state-of-the-art encoders, four vision datasets, and under different data and imbalancedness regimes demonstrate DPPL's high performance under strong privacy guarantees in challenging private learning setups

Problem

Research questions and friction points this paper is trying to address.

Addresses privacy leakage in machine learning models

Improves private learning in high privacy regimes

Enhances transfer learning with imbalanced datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially Private Prototype Learning

Leverages pre-trained encoders

Generates DP prototypes publicly

🔎 Similar Papers

No similar papers found.