🤖 AI Summary
Federated learning inherently involves structured stochasticity—such as partial client participation across iterations and random subnetwork updates—that induces privacy amplification unaccounted for by conventional differential privacy (DP) analyses.
Method: We propose balanced iterative subsampling to replace Poisson sampling, enabling more accurate modeling of the stochastic mechanism under non-i.i.d. data; we systematically formalize privacy amplification from model partitioning, Dropout, and stochastic subnetwork updates.
Contribution/Results: We theoretically prove and empirically validate that such structured randomness yields significant, quantifiable, and previously overlooked privacy gains—enhancing DP guarantees without sacrificing model utility. Our work establishes the first unified analytical framework jointly incorporating model parallelism and data sampling into privacy amplification analysis, thereby bridging a critical gap between practical federated training dynamics and rigorous DP accounting.
📝 Abstract
We study how inherent randomness in the training process -- where each sample (or client in federated learning) contributes only to a randomly selected portion of training -- can be leveraged for privacy amplification. This includes (1) data partitioning, where a sample participates in only a subset of training iterations, and (2) model partitioning, where a sample updates only a subset of the model parameters. We apply our framework to model parallelism in federated learning, where each client updates a randomly selected subnetwork to reduce memory and computational overhead, and show that existing methods, e.g. model splitting or dropout, provide a significant privacy amplification gain not captured by previous privacy analysis techniques. Additionally, we introduce Balanced Iteration Subsampling, a new data partitioning method where each sample (or client) participates in a fixed number of training iterations. We show that this method yields stronger privacy amplification than Poisson (i.i.d.) sampling of data (or clients). Our results demonstrate that randomness in the training process, which is structured rather than i.i.d. and interacts with data in complex ways, can be systematically leveraged for significant privacy amplification.