🤖 AI Summary
This work addresses the challenge of establishing rigorous differential privacy (DP) guarantees when injecting heavy-tailed noise—specifically, α-stable noise (including infinite-variance cases)—into stochastic gradient descent (SGD). Prior analyses typically rely on gradient clipping, bounded gradients, or convexity assumptions. We provide the first $(epsilon,delta)$-DP guarantee for SGD with α-stable noise under **no gradient clipping**, **no gradient norm boundedness assumption**, and **non-convex loss functions**. Our key theoretical contribution is proving that α-stable noise alone achieves $(0,O(1/n))$-DP, demonstrating that projection or clipping steps are often unnecessary. Furthermore, we unify the privacy–optimization trade-off analysis for both heavy-tailed (e.g., α-stable) and light-tailed (e.g., Gaussian) noise, showing that heavy-tailed noise serves as an effective, theoretically justified alternative to Gaussian noise. These results establish a more general and practical foundation for privacy-preserving optimization in unconstrained settings.
📝 Abstract
The injection of heavy-tailed noise into the iterates of stochastic gradient descent (SGD) has garnered growing interest in recent years due to its theoretical and empirical benefits for optimization and generalization. However, its implications for privacy preservation remain largely unexplored. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the light-tailed Gaussian distribution. Considering the $(epsilon, delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, O(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory can handle unbounded gradients without clipping, and reveals that under mild assumptions, such a projection step is not actually necessary. Our results suggest that, given other benefits of heavy-tails in optimization, heavy-tailed noising schemes can be a viable alternative to their light-tailed counterparts.