Training Set Reconstruction from Differentially Private Forests: How Effective is DP?

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals a critical vulnerability in differentially private (DP) random forests: even under practical ε-DP guarantees, adversaries can reconstruct large fractions of the original training data with high fidelity. To expose this risk, the authors introduce the first systematic application of constraint programming to DP random forests, proposing an exact reconstruction attack grounded in structural analysis and precise modeling of privacy mechanisms. Experiments demonstrate that existing DP random forest implementations—while preserving utility (e.g., accuracy substantially exceeding random guessing)—remain highly susceptible to this attack; conversely, configurations robust against reconstruction suffer catastrophic utility degradation, collapsing to performance equivalent to a constant classifier. This work establishes the first verifiable security boundary for DP machine learning, empirically uncovering a fundamental, unavoidable trade-off among DP strength, model utility, and resilience to training-data reconstruction.

Technology Category

Application Category

📝 Abstract
Recent research has shown that machine learning models are vulnerable to privacy attacks targeting their training data. Differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protections. In this paper, we introduce a reconstruction attack targeting state-of-the-art $varepsilon$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees, and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak substantial portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks and maintain non-trivial predictive performance.
Problem

Research questions and friction points this paper is trying to address.

Reconstruction attack on DP random forests
Evaluate DP's privacy protection effectiveness
Improve DP forests' resilience and performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstruction attack on DP forests
Constraint programming for dataset recovery
Balancing privacy and model utility
🔎 Similar Papers
No similar papers found.
A
Alice Gorg'e
Ecole Polytechnique, Paris, France
J
Julien Ferry
CIRRELT & SCALE-AI Chair in Data-Driven Supply Chains, Department of Mathematics and Industrial Engineering, Polytechnique Montreal, Canada
S
S'ebastien Gambs
Universite du Quebec a Montreal, Canada
Thibaut Vidal
Thibaut Vidal
Professor, SCALE-AI Chair, MAGI, Polytechnique Montréal
Combinatorial OptimizationMachine LearningOperations ResearchTransportation and LogisticsExplainable AI