Linear-Time User-Level DP-SCO via Robust Statistics

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

In user-level differentially private stochastic convex optimization (DP-SCO), iterative noise injection leads to noise accumulation and degraded utility. Method: This paper introduces robust statistics—specifically, the median and trimmed mean—into gradient aggregation for the first time, coupled with refined sensitivity analysis and adaptive privacy budget allocation. Contribution/Results: The proposed method achieves optimal privacy–utility trade-offs in linear time complexity $O(n)$. We theoretically establish an $O(1/sqrt{n})$ convergence rate (up to logarithmic factors) under $varepsilon$-differential privacy and prove its tightness via an information-theoretic lower bound. Compared to existing DP-SGD–based approaches, our method significantly reduces gradient estimation variance while simultaneously ensuring strong privacy guarantees and high optimization accuracy.

Technology Category

Application Category

📝 Abstract

User-level differentially private stochastic convex optimization (DP-SCO) has garnered significant attention due to the paramount importance of safeguarding user privacy in modern large-scale machine learning applications. Current methods, such as those based on differentially private stochastic gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility due to the need to privatize every intermediate iterate. In this work, we introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges. Our approach uniquely bounds the sensitivity of all intermediate iterates of SGD with gradient estimation based on robust statistics, thereby significantly reducing the gradient estimation noise for privacy purposes and enhancing the privacy-utility trade-off. By sidestepping the repeated privatization required by previous methods, our algorithm not only achieves an improved theoretical privacy-utility trade-off but also maintains computational efficiency. We complement our algorithm with an information-theoretic lower bound, showing that our upper bound is optimal up to logarithmic factors and the dependence on $epsilon$. This work sets the stage for more robust and efficient privacy-preserving techniques in machine learning, with implications for future research and application in the field.

Problem

Research questions and friction points this paper is trying to address.

Improves user-level DP-SCO efficiency

Reduces noise in gradient estimation

Enhances privacy-utility trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear-time algorithm

Robust statistics

Reduced gradient noise

🔎 Similar Papers

On-Device Recommender Systems: A Comprehensive Survey