🤖 AI Summary
This paper addresses the challenges of differentially private (DP) parameter estimation, statistical inference, and multiple testing control in high-dimensional linear regression. To this end, we propose a unified framework comprising three key components: (i) the first DP-protected Bayesian Information Criterion (BIC) for adaptive model sparsity selection; (ii) a privacy-preserving debiased LASSO estimator enabling unbiased parameter estimation and valid confidence interval construction under DP; and (iii) the first provably false discovery rate (FDR)-controlled DP multiple testing procedure, built upon a privacy-adapted Benjamini–Hochberg algorithm. Theoretical analysis establishes rigorous DP guarantees, statistical efficiency, and exact FDR control at the nominal level. Empirical evaluation demonstrates substantial improvements over state-of-the-art DP baselines: 42% higher sparsity identification accuracy, confidence interval coverage approaching the nominal level, and stable FDR containment strictly below the pre-specified threshold.
📝 Abstract
This paper presents novel methodologies for conducting practical differentially private (DP) estimation and inference in high-dimensional linear regression. We start by proposing a differentially private Bayesian Information Criterion (BIC) for selecting the unknown sparsity parameter in DP-Lasso, eliminating the need for prior knowledge of model sparsity, a requisite in the existing literature. Then we propose a differentially private debiased LASSO algorithm that enables privacy-preserving inference on regression parameters. Our proposed method enables accurate and private inference on the regression parameters by leveraging the inherent sparsity of high-dimensional linear regression models. Additionally, we address the issue of multiple testing in high-dimensional linear regression by introducing a differentially private multiple testing procedure that controls the false discovery rate (FDR). This allows for accurate and privacy-preserving identification of significant predictors in the regression model. Through extensive simulations and real data analysis, we demonstrate the efficacy of our proposed methods in conducting inference for high-dimensional linear models while safeguarding privacy and controlling the FDR.