Leveraging Randomness in Model and Data Partitioning for Privacy Amplification

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

261K/year

🤖 AI Summary

Federated learning inherently involves structured stochasticity—such as partial client participation across iterations and random subnetwork updates—that induces privacy amplification unaccounted for by conventional differential privacy (DP) analyses. Method: We propose balanced iterative subsampling to replace Poisson sampling, enabling more accurate modeling of the stochastic mechanism under non-i.i.d. data; we systematically formalize privacy amplification from model partitioning, Dropout, and stochastic subnetwork updates. Contribution/Results: We theoretically prove and empirically validate that such structured randomness yields significant, quantifiable, and previously overlooked privacy gains—enhancing DP guarantees without sacrificing model utility. Our work establishes the first unified analytical framework jointly incorporating model parallelism and data sampling into privacy amplification analysis, thereby bridging a critical gap between practical federated training dynamics and rigorous DP accounting.

Technology Category

Application Category

📝 Abstract

We study how inherent randomness in the training process -- where each sample (or client in federated learning) contributes only to a randomly selected portion of training -- can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce Balanced Iteration Subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.

Problem

Research questions and friction points this paper is trying to address.

Leverage randomness in training for privacy amplification

Apply model and data partitioning in federated learning

Introduce Balanced Iteration Subsampling for stronger privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random data partitioning for privacy enhancement

Model partitioning reduces memory and computation

Balanced Iteration Subsampling strengthens privacy

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding