🤖 AI Summary
This paper studies private linear regression under Gaussian covariates with unknown covariance structure, aiming for optimal prediction error and sample complexity under pure or approximate differential privacy, while being robust to arbitrary outliers. Methodologically, we propose the first polynomial-time, sample-optimal (matching the information-theoretic lower bound tightly), and condition-number-agnostic private regression algorithm. We introduce a novel “covariance-aware robust-to-private transformation” framework that integrates sum-of-squares (SoS) robustness analysis with geometric covariance adaptation techniques. Our algorithm achieves optimal statistical error rates even in the presence of a small fraction of arbitrary outliers. As a byproduct, we derive a covariance-aware optimal private mean estimator. The approach eliminates reliance on strong assumptions about the covariance matrix—such as bounded condition number—while preserving statistical efficiency and computational tractability under privacy constraints.
📝 Abstract
We consider the task of privately obtaining prediction error guarantees in ordinary least-squares regression problems with Gaussian covariates (with unknown covariance structure). We provide the first sample-optimal polynomial time algorithm for this task under both pure and approximate differential privacy. We show that any improvement to the sample complexity of our algorithm would violate either statistical-query or information-theoretic lower bounds. Additionally, our algorithm is robust to a small fraction of arbitrary outliers and achieves optimal error rates as a function of the fraction of outliers. In contrast, all prior efficient algorithms either incurred sample complexities with sub-optimal dimension dependence, scaling with the condition number of the covariates, or obtained a polynomially worse dependence on the privacy parameters. Our technical contributions are two-fold: first, we leverage resilience guarantees of Gaussians within the sum-of-squares framework. As a consequence, we obtain efficient sum-of-squares algorithms for regression with optimal robustness rates and sample complexity. Second, we generalize the recent robustness-to-privacy framework [HKMN23, (arXiv:2212.05015)] to account for the geometry induced by the covariance of the input samples. This framework crucially relies on the robust estimators to be sum-of-squares algorithms, and combining the two steps yields a sample-optimal private regression algorithm. We believe our techniques are of independent interest, and we demonstrate this by obtaining an efficient algorithm for covariance-aware mean estimation, with an optimal dependence on the privacy parameters.