🤖 AI Summary
This work establishes a theoretical connection between privacy and generalization in differentially private stochastic gradient descent (DP-SGD). By integrating differential privacy, max-information analysis, and the PAC-Bayes framework, it derives the first finite-sample bound on the approximate max-information of DP-SGD, showing that this bound grows at most linearly with the sample size. Building on this result, the paper presents a fully explicit PAC-Bayes generalization error bound, where the complexity term is directly governed by DP-SGD’s optimization hyperparameters, and the prior distribution can be learned from the training process itself. This advances the understanding of the privacy–generalization trade-off and provides DP-SGD models with computable and tunable theoretical guarantees on generalization performance.
📝 Abstract
Understanding the relationship between generalization and privacy remains a central challenge in modern machine learning theory, particularly for deep networks trained by variants of differentially private stochastic gradient descent (DP-SGD). In this work we make progress on this persistent open problem by proving a finite-sample bound on the approximate max-information of DP-SGD that exhibits scaling properties comparable with (Dwork et al, 2015)'s classic result for $ε$-differentially private algorithms, namely at most linear in the dataset size. From our result we obtain a general-purpose PAC-Bayes generalization bound in which the necessary prior distribution can be learned by DP-SGD, as well as a generalization bound for DP-SGD-trained models themselves, with a complexity term that is fully explicit and controlled by the optimization hyperparameters.