🤖 AI Summary
This paper addresses the challenging problem of estimating high-dimensional sparse linear regression models under differential privacy (DP) when response variables exhibit heavy-tailed distributions—circumventing restrictive assumptions of light-tailed errors or low-dimensional settings adopted by prior work. We propose two novel $(varepsilon,delta)$-DP algorithms: DP-IHT-H and DP-IHT-L, which achieve statistically optimal error rates—respectively dependent on and independent of the tail parameter $zeta$. Our approach integrates Huber loss for robustness, iterative hard thresholding (IHT) for sparsity recovery, private gradient clipping, and calibrated noise injection to ensure both statistical efficiency and rigorous privacy guarantees. Theoretical analysis establishes tighter statistical error bounds than existing DP linear regression methods. Extensive experiments on synthetic and real-world datasets demonstrate substantial improvements in the privacy–accuracy trade-off.
📝 Abstract
As a fundamental problem in machine learning and differential privacy (DP), DP linear regression has been extensively studied. However, most existing methods focus primarily on either regular data distributions or low-dimensional cases with irregular data. To address these limitations, this paper provides a comprehensive study of DP sparse linear regression with heavy-tailed responses in high-dimensional settings. In the first part, we introduce the DP-IHT-H method, which leverages the Huber loss and private iterative hard thresholding to achieve an estimation error bound of ( ilde{O}iggl( s^{* frac{1 }{2}} cdot iggl(frac{log d}{n}iggr)^{frac{zeta}{1 + zeta}} + s^{* frac{1 + 2zeta}{2 + 2zeta}} cdot iggl(frac{log^2 d}{n varepsilon}iggr)^{frac{zeta}{1 + zeta}} iggr) ) under the $(varepsilon, delta)$-DP model, where $n$ is the sample size, $d$ is the dimensionality, $s^*$ is the sparsity of the parameter, and $zeta in (0, 1]$ characterizes the tail heaviness of the data. In the second part, we propose DP-IHT-L, which further improves the error bound under additional assumptions on the response and achieves ( ilde{O}Bigl(frac{(s^*)^{3/2} log d}{n varepsilon}Bigr). ) Compared to the first result, this bound is independent of the tail parameter $zeta$. Finally, through experiments on synthetic and real-world datasets, we demonstrate that our methods outperform standard DP algorithms designed for ``regular'' data.