Sample-Optimal Private Regression in Polynomial Time

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies private linear regression under Gaussian covariates with unknown covariance structure, aiming for optimal prediction error and sample complexity under pure or approximate differential privacy, while being robust to arbitrary outliers. Methodologically, we propose the first polynomial-time, sample-optimal (matching the information-theoretic lower bound tightly), and condition-number-agnostic private regression algorithm. We introduce a novel “covariance-aware robust-to-private transformation” framework that integrates sum-of-squares (SoS) robustness analysis with geometric covariance adaptation techniques. Our algorithm achieves optimal statistical error rates even in the presence of a small fraction of arbitrary outliers. As a byproduct, we derive a covariance-aware optimal private mean estimator. The approach eliminates reliance on strong assumptions about the covariance matrix—such as bounded condition number—while preserving statistical efficiency and computational tractability under privacy constraints.

Technology Category

Application Category

📝 Abstract
We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.
Problem

Research questions and friction points this paper is trying to address.

Private prediction error guarantees in Gaussian OLS regression
Sample-optimal polynomial time algorithm under differential privacy
Robust regression with optimal error rates for outlier fractions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sample-optimal polynomial time private regression
Sum-of-squares framework for robust regression
Geometry-aware robustness-to-privacy generalization
🔎 Similar Papers
No similar papers found.