An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This paper addresses the problem of estimating the top-$k$ principal components of a covariance matrix $Sigma$ from a collection of random matrices under differential privacy. Existing methods suffer from limitations: requiring superlinear sample complexity in dimension ($n gg d$), excessive noise injection, or applicability only to $k=1$. We propose the first efficient differentially private algorithm supporting arbitrary $k leq d$. Our method builds upon an iterative optimization framework with adaptive noise injection, leveraging intrinsic data randomness to reduce privacy cost. We establish theoretical guarantees showing near-optimal statistical error with only $n = ilde{O}(d)$ samples; for $k=1$, our error matches the information-theoretic lower bound. We further provide tight upper and lower bounds characterizing the fundamental trade-off. Experiments demonstrate that our approach significantly outperforms existing baselines in the privacy–utility trade-off.

Technology Category

Application Category

📝 Abstract

Given $n$ i.i.d. random matrices $A_i in mathbb{R}^{d imes d}$ that share a common expectation $Σ$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest variance directions of $Σ$, while preserving differential privacy (DP) of each individual $A_i$. Existing methods either (i) require the sample size $n$ to scale super-linearly with dimension $d$, even under Gaussian assumptions on the $A_i$, or (ii) introduce excessive noise for DP even when the intrinsic randomness within $A_i$ is small. Liu et al. (2022a) addressed these issues for sub-Gaussian data but only for estimating the top eigenvector ($k=1$) using their algorithm DP-PCA. We propose the first algorithm capable of estimating the top $k$ eigenvectors for arbitrary $k leq d$, whilst overcoming both limitations above. For $k=1$ our algorithm matches the utility guarantees of DP-PCA, achieving near-optimal statistical error even when $n = ilde{!O}(d)$. We further provide a lower bound for general $k > 1$, matching our upper bound up to a factor of $k$, and experimentally demonstrate the advantages of our algorithm over comparable baselines.

Problem

Research questions and friction points this paper is trying to address.

Estimating top-k eigenvectors under differential privacy constraints

Overcoming sample size and noise limitations in private PCA

Achieving near-optimal error rates for k-dimensional subspace estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative algorithm for k-PCA with adaptive noise

Estimates top k eigenvectors for arbitrary k

Achieves near-optimal error with O(d) samples

🔎 Similar Papers

FastLloyd: Federated, Accurate, Secure, and Tunable k-Means Clustering with Differential Privacy