Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Differential privacy machine learning (DPML) suffers from inconsistent evaluation protocols, heterogeneous implementations, and poor reproducibility, undermining the credibility of state-of-the-art (SoTA) claims. To address this, we conduct the first systematic benchmarking study of 11 cutting-edge DPML methods, rigorously assessing their reproducibility and transferability via controlled experiments—standardizing datasets, model architectures, and foundational techniques (e.g., DP-SGD) across diverse execution environments. Our results reveal substantial performance degradation for most methods outside their original configurations, confirming a reproducibility crisis in DPML. We identify critical factors—including DP noise sensitivity and hyperparameter coupling—and propose targeted mitigation strategies. Furthermore, we establish the first open DPML reproducibility benchmark and release a best-practice guideline to foster more scientific, comparable, and reliable DPML research.

Technology Category

Application Category

📝 Abstract

There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.

Problem

Research questions and friction points this paper is trying to address.

Evaluating reproducibility of differentially private machine learning techniques

Assessing generalizability across diverse experimental conditions

Addressing challenges in reliable comparison of DPML methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reproducibility testing of DPML techniques

Evaluating 11 state-of-the-art methods systematically

Addressing DP-specific randomness challenges

🔎 Similar Papers

Differentially Private Federated Learning: A Systematic Review