🤖 AI Summary
This paper addresses the challenge of false discovery rate (FDR) control in variable selection under privacy constraints. We propose DP-knockoff, the first framework integrating differential privacy (DP) with Model-X knockoffs. Methodologically, we design an ε-differentially private data perturbation mechanism that preserves strict privacy of the original data while generating valid knockoff variables and retaining exact finite-sample FDR control. Theoretically, under mild regularity conditions, we prove that privacy-induced noise does not impair the asymptotic optimality of statistical power; FDR is rigorously controlled at the pre-specified level α, and the power loss vanishes as the sample size grows. Empirical evaluations confirm robust performance across both high- and low-dimensional settings, delivering simultaneous, rigorous guarantees on FDR control and privacy protection.
📝 Abstract
Model-X knockoff framework offers a model-free variable selection method that ensures finite sample false discovery rate (FDR) control. However, the complexity of generating knockoff variables, coupled with the model-free assumption, presents significant challenges for protecting data privacy in this context. In this paper, we propose a comprehensive framework for knockoff inference within the differential privacy paradigm. Our proposed method guarantees robust privacy protection while preserving the exact FDR control entailed by the original model-X knockoff procedure. We further conduct power analysis and establish sufficient conditions under which the noise added for privacy preservation does not asymptotically compromise power. Through various applications, we demonstrate that the differential privacy knockoff (DP-knockoff) method can be effectively utilized to safeguard privacy during variable selection with FDR control in both low and high dimensional settings.