Differentially Private High-dimensional Variable Selection via Integer Programming

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Sparse variable selection under differential privacy (DP) remains challenging in high-dimensional learning due to computational intractability of combinatorial search and statistical efficiency trade-offs. Method: We propose two pure DP estimators: (i) a structured sampling strategy based on the exponential mechanism, circumventing exhaustive combinatorial search; and (ii) the first integration of modern mixed-integer programming (MIP) techniques into DP variable selection, coupled with least-squares and hinge loss optimization for regression and classification, respectively. Contribution/Results: We establish rigorous statistical recovery guarantees under DP. Experiments on datasets with up to 10,000 dimensions demonstrate that our methods significantly outperform existing DP approaches, achieving state-of-the-art variable selection recovery accuracy in both regression and classification tasks—while simultaneously ensuring strong privacy protection, high estimation precision, and model interpretability.

Technology Category

Application Category

📝 Abstract

Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale non-private sparse regression - known as Best Subset Selection (BSS) - with millions of variables in minutes. However, extending these algorithmic advances to the setting of Differential Privacy (DP) has remained largely unexplored. In this paper, we introduce two new pure differentially private estimators for sparse variable selection, levering modern MIP techniques. Our framework is general and applies broadly to problems like sparse regression or classification, and we provide theoretical support recovery guarantees in the case of BSS. Inspired by the exponential mechanism, we develop structured sampling procedures that efficiently explore the non-convex objective landscape, avoiding the exhaustive combinatorial search in the exponential mechanism. We complement our theoretical findings with extensive numerical experiments, using both least squares and hinge loss for our objective function, and demonstrate that our methods achieve state-of-the-art empirical support recovery, outperforming competing algorithms in settings with up to $p=10^4$.

Problem

Research questions and friction points this paper is trying to address.

Develops differentially private high-dimensional variable selection methods

Leverages integer programming for efficient sparse feature selection

Provides theoretical guarantees for support recovery under privacy constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Private estimators using mixed integer programming

Structured sampling for non-convex objective exploration

Theoretical guarantees for support recovery

🔎 Similar Papers

No similar papers found.