PRIMO: Private Regression in Multiple Outcomes

📅 2023-03-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

We address differentially private linear regression with multiple correlated response variables sharing common features—termed the PRIMO setting—where naively applying single-response privacy mechanisms repeatedly amplifies estimation error by a factor of $sqrt{l}$ in the number of responses $l$. This work formally introduces and analyzes the PRIMO setting for the first time. We propose a novel joint mechanism combining geometric projection onto a low-dimensional shared subspace with sufficient statistic perturbation (SSP): multi-task regression is first constrained to this subspace, and then noise is added to the compressed sufficient statistics. We establish a theoretical guarantee that the mean squared error of our estimator is asymptotically independent of $l$. Empirical evaluation on multi-trait genetic risk prediction demonstrates consistent and significant improvements over baselines across all $l$, with notable accuracy advantages even for small $l$.

📝 Abstract

We introduce a new private regression setting we call Private Regression in Multiple Outcomes (PRIMO), inspired by the common situation where a data analyst wants to perform a set of $l$ regressions while preserving privacy, where the features $X$ are shared across all $l$ regressions, and each regression $i in [l]$ has a different vector of outcomes $y_i$. Naively applying existing private linear regression techniques $l$ times leads to a $sqrt{l}$ multiplicative increase in error over the standard linear regression setting. We apply a variety of techniques including sufficient statistics perturbation (SSP) and geometric projection-based methods to develop scalable algorithms that outperform this baseline across a range of parameter regimes. In particular, we obtain no dependence on l in the asymptotic error when $l$ is sufficiently large. Empirically, on the task of genomic risk prediction with multiple phenotypes we find that even for values of $l$ far smaller than the theory would predict, our projection-based method improves the accuracy relative to the variant that doesn't use the projection.

Problem

Research questions and friction points this paper is trying to address.

Differential Privacy

Multi-Outcome Regression

Error Accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

PRIMO

Privacy-preserving regression

Multi-query analysis

🔎 Similar Papers

No similar papers found.