Differentially Private Sparse Linear Regression with Heavy-tailed Responses

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This paper addresses the challenging problem of estimating high-dimensional sparse linear regression models under differential privacy (DP) when response variables exhibit heavy-tailed distributions—circumventing restrictive assumptions of light-tailed errors or low-dimensional settings adopted by prior work. We propose two novel $(varepsilon,delta)$-DP algorithms: DP-IHT-H and DP-IHT-L, which achieve statistically optimal error rates—respectively dependent on and independent of the tail parameter $zeta$. Our approach integrates Huber loss for robustness, iterative hard thresholding (IHT) for sparsity recovery, private gradient clipping, and calibrated noise injection to ensure both statistical efficiency and rigorous privacy guarantees. Theoretical analysis establishes tighter statistical error bounds than existing DP linear regression methods. Extensive experiments on synthetic and real-world datasets demonstrate substantial improvements in the privacy–accuracy trade-off.

Technology Category

Application Category

📝 Abstract

As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( ilde{O}iggl( s^{* frac{1 }{2}} cdot iggl(frac{log d}{n}iggr)^{frac{zeta}{1 + zeta}} + s^{* frac{1 + 2zeta}{2 + 2zeta}} cdot iggl(frac{log^2 d}{n varepsilon}iggr)^{frac{zeta}{1 + zeta}} iggr) ) under the $(varepsilon, delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^*$ is the sparsity of the parameter, and $zeta in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves ( ilde{O}Bigl(frac{(s^*)^{3/2} log d}{n varepsilon}Bigr). ) Compared to the first result, this bound is independent of the tail parameter $zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.

Problem

Research questions and friction points this paper is trying to address.

DP sparse linear regression with heavy-tailed responses

High-dimensional settings with irregular data

Improving error bounds under differential privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Huber loss for heavy-tailed responses

Employs private iterative hard thresholding

Achieves improved error bounds under assumptions

🔎 Similar Papers

No similar papers found.