Differential Privacy in Two-Layer Networks: How DP-SGD Harms Fairness and Robustness

๐Ÿ“… 2026-03-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the performance degradation, fairness deterioration, and reduced adversarial robustness commonly observed in differentially private training, whose underlying mechanisms in non-convex neural networks remain poorly understood. We propose a unified feature-centering framework to theoretically analyze the feature learning dynamics of DP-SGD in two-layer ReLU convolutional networks. For the first time from a feature learning perspective, we reveal that privacy-preserving noise induces fairness and robustness issues through an imbalance in the feature-to-noise ratio (FNR), particularly affecting inter-class separation and long-tailed samples. We derive a bound on test loss and validate on both synthetic and real-world data that FNR imbalance leads to suboptimal feature learning, thereby exacerbating group disparities and adversarial vulnerability. Our findings also cast doubt on the universal efficacy of the โ€œpublic pretraining + private finetuningโ€ paradigm.

Technology Category

Application Category

๐Ÿ“ Abstract
Differentially private learning is essential for training models on sensitive data, but empirical studies consistently show that it can degrade performance, introduce fairness issues like disparate impact, and reduce adversarial robustness. The theoretical underpinnings of these phenomena in modern, non-convex neural networks remain largely unexplored. This paper introduces a unified feature-centric framework to analyze the feature learning dynamics of differentially private stochastic gradient descent (DP-SGD) in two-layer ReLU convolutional neural networks. Our analysis establishes test loss bounds governed by a crucial metric: the feature-to-noise ratio (FNR). We demonstrate that the noise required for privacy leads to suboptimal feature learning, and specifically show that: 1) imbalanced FNRs across classes and subpopulations cause disparate impact; 2) even in the same class, noise has a greater negative impact on semantically long-tailed data; and 3) noise injection exacerbates vulnerability to adversarial attacks. Furthermore, our analysis reveals that the popular paradigm of public pre-training and private fine-tuning does not guarantee improvement, particularly under significant feature distribution shifts between datasets. Experiments on synthetic and real-world data corroborate our theoretical findings.
Problem

Research questions and friction points this paper is trying to address.

Differential Privacy
Fairness
Robustness
Feature Learning
DP-SGD
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differential Privacy
Feature-to-Noise Ratio
DP-SGD
Fairness
Adversarial Robustness
๐Ÿ”Ž Similar Papers
No similar papers found.