Lap2: Revisiting Laplace DP-SGD for High Dimensions via Majorization Theory

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the poor utility of the Laplace mechanism in high-dimensional DP-SGD, which stems from its reliance on L1-norm gradient clipping and consequently excessive noise. By leveraging majorization theory and Schur-convexity, the authors derive for the first time a tight, data-independent, coordinate-wise moment bound compatible with L2-norm clipping. This advancement enables the Laplace mechanism to operate under L2 constraints and supports privacy accounting via moments up to several thousand orders, effectively overcoming dimensional limitations. Under a strong privacy budget (ε = 0.54), fine-tuning RoBERTa-base on SST-2 achieves 87.88% accuracy—significantly outperforming both the Gaussian mechanism (87.16%) and the standard Laplace mechanism (48.97%).

Technology Category

Application Category

📝 Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) is a cornerstone technique for ensuring privacy in deep learning, widely used in both training from scratch and fine-tuning large-scale language models. While DP-SGD predominantly relies on the Gaussian mechanism, the Laplace mechanism remains underutilized due to its reliance on L1 norm clipping. This constraint severely limits its practicality in high-dimensional models because the L1 norm of an n-dimensional gradient can be up to sqrt(n) times larger than its L2 norm. As a result, the required noise scale grows significantly with model size, leading to poor utility or untrainable models. In this work, we introduce Lap2, a new solution that enables L2 clipping for Laplace DP-SGD while preserving strong privacy guarantees. We overcome the dimensionality-driven clipping barrier by computing coordinate-wise moment bounds and applying majorization theory to construct a tight, data-independent upper bound over the full model. By exploiting the Schur-convexity of the moment accountant function, we aggregate these bounds using a carefully designed majorization set that respects the L2 clipping constraint. This yields a multivariate privacy accountant that scales gracefully with model dimension and enables the use of thousands of moments. Empirical evaluations demonstrate that our approach significantly improves the performance of Laplace DP-SGD, achieving results comparable to or better than Gaussian DP-SGD under strong privacy constraints. For instance, fine-tuning RoBERTa-base (125M parameters) on SST-2 achieves 87.88% accuracy at epsilon=0.54, outperforming Gaussian (87.16%) and standard Laplace (48.97%) under the same budget.

Problem

Research questions and friction points this paper is trying to address.

Differential Privacy

Laplace Mechanism

High-Dimensional Models

L1 Norm Clipping

DP-SGD

Innovation

Methods, ideas, or system contributions that make the work stand out.

Laplace DP-SGD

L2 clipping

majorization theory