Towards Reliable and Generalizable Differentially Private Machine Learning (Extended Version)

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Differential privacy machine learning (DPML) suffers from inconsistent evaluation protocols, heterogeneous implementations, and poor reproducibility, undermining the credibility of state-of-the-art (SoTA) claims. To address this, we conduct the first systematic benchmarking study of 11 cutting-edge DPML methods, rigorously assessing their reproducibility and transferability via controlled experiments—standardizing datasets, model architectures, and foundational techniques (e.g., DP-SGD) across diverse execution environments. Our results reveal substantial performance degradation for most methods outside their original configurations, confirming a reproducibility crisis in DPML. We identify critical factors—including DP noise sensitivity and hyperparameter coupling—and propose targeted mitigation strategies. Furthermore, we establish the first open DPML reproducibility benchmark and release a best-practice guideline to foster more scientific, comparable, and reliable DPML research.

Technology Category

Application Category

📝 Abstract
There is a flurry of recent research papers proposing novel differentially private machine learning (DPML) techniques. These papers claim to achieve new state-of-the-art (SoTA) results and offer empirical results as validation. However, there is no consensus on which techniques are most effective or if they genuinely meet their stated claims. Complicating matters, heterogeneity in codebases, datasets, methodologies, and model architectures make direct comparisons of different approaches challenging. In this paper, we conduct a reproducibility and replicability (R+R) experiment on 11 different SoTA DPML techniques from the recent research literature. Results of our investigation are varied: while some methods stand up to scrutiny, others falter when tested outside their initial experimental conditions. We also discuss challenges unique to the reproducibility of DPML, including additional randomness due to DP noise, and how to address them. Finally, we derive insights and best practices to obtain scientifically valid and reliable results.
Problem

Research questions and friction points this paper is trying to address.

Evaluating reproducibility of differentially private machine learning techniques
Assessing generalizability across diverse experimental conditions
Addressing challenges in reliable comparison of DPML methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reproducibility testing of DPML techniques
Evaluating 11 state-of-the-art methods systematically
Addressing DP-specific randomness challenges
🔎 Similar Papers
No similar papers found.