🤖 AI Summary
This paper addresses the challenge of simultaneously ensuring privacy preservation and statistical efficiency in high-dimensional dimensionality reduction. We propose the first differentially private sliced inverse regression (DP-SIR) algorithm. Methodologically, we derive minimax risk lower bounds for DP-SIR under both low- and high-dimensional settings and design a private estimator achieving these bounds up to logarithmic factors, integrating matrix perturbation analysis, random projection, and adaptive noise injection. Theoretically, we establish— for the first time—the fundamental statistical limits of SIR under differential privacy and extend our results naturally to characterize the privacy–utility trade-off in sparse principal component analysis. Empirical evaluations demonstrate that, across arbitrary privacy budgets ε, our algorithm’s subspace estimation error converges to the theoretical lower bound, consistently outperforming existing baselines and achieving optimal privacy–utility trade-offs on both synthetic and real-world datasets.
📝 Abstract
Privacy preservation has become a critical concern in high-dimensional data analysis due to the growing prevalence of data-driven applications. Since its proposal, sliced inverse regression has emerged as a widely utilized statistical technique to reduce the dimensionality of covariates while maintaining sufficient statistical information. In this paper, we propose optimally differentially private algorithms specifically designed to address privacy concerns in the context of sufficient dimension reduction. We establish lower bounds for differentially private sliced inverse regression in low and high dimensional settings. Moreover, we develop differentially private algorithms that achieve the minimax lower bounds up to logarithmic factors. Through a combination of simulations and real data analysis, we illustrate the efficacy of these differentially private algorithms in safeguarding privacy while preserving vital information within the reduced dimension space. As a natural extension, we can readily offer analogous lower and upper bounds for differentially private sparse principal component analysis, a topic that may also be of potential interest to the statistics and machine learning community.