Differentially Private Sparse Linear Regression with Heavy-tailed Responses

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenging problem of estimating high-dimensional sparse linear regression models under differential privacy (DP) when response variables exhibit heavy-tailed distributions—circumventing restrictive assumptions of light-tailed errors or low-dimensional settings adopted by prior work. We propose two novel $(varepsilon,delta)$-DP algorithms: DP-IHT-H and DP-IHT-L, which achieve statistically optimal error rates—respectively dependent on and independent of the tail parameter $zeta$. Our approach integrates Huber loss for robustness, iterative hard thresholding (IHT) for sparsity recovery, private gradient clipping, and calibrated noise injection to ensure both statistical efficiency and rigorous privacy guarantees. Theoretical analysis establishes tighter statistical error bounds than existing DP linear regression methods. Extensive experiments on synthetic and real-world datasets demonstrate substantial improvements in the privacy–accuracy trade-off.

Technology Category

Application Category

📝 Abstract
As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( ilde{O}iggl( s^{* frac{1 }{2}} cdot iggl(frac{log d}{n}iggr)^{frac{zeta}{1 + zeta}} + s^{* frac{1 + 2zeta}{2 + 2zeta}} cdot iggl(frac{log^2 d}{n varepsilon}iggr)^{frac{zeta}{1 + zeta}} iggr) ) under the $(varepsilon, delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^*$ is the sparsity of the parameter, and $zeta in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves ( ilde{O}Bigl(frac{(s^*)^{3/2} log d}{n varepsilon}Bigr). ) Compared to the first result, this bound is independent of the tail parameter $zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.
Problem

Research questions and friction points this paper is trying to address.

DP sparse linear regression with heavy-tailed responses
High-dimensional settings with irregular data
Improving error bounds under differential privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Huber loss for heavy-tailed responses
Employs private iterative hard thresholding
Achieves improved error bounds under assumptions
🔎 Similar Papers
No similar papers found.
X
Xizhi Tian
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology, Utrecht University
Meng Ding
Meng Ding
University at Buffalo
TrustworthyStatistical Learning
T
Touming Tao
Technical University Berlin
Zihang Xiang
Zihang Xiang
University of California, Los Angeles (UCLA)
Data PrivacyDifferential Privacy
D
Di Wang
Provable Responsible AI and Data Analytics (PRADA) Lab, King Abdullah University of Science and Technology