Learning from End User Data with Shuffled Differential Privacy over Kernel Densities

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses privacy-preserving learning from distributed user data under the shuffle differential privacy (Shuffle DP) model. We propose the first kernel density estimation (KDE) protocol that simultaneously achieves high accuracy and strong privacy guarantees without relying on a trusted central server—thereby avoiding the trust assumptions of central DP and the severe utility loss of local DP. Our protocol attains density estimation error asymptotically approaching the theoretical lower bound of central DP. Leveraging the private density estimates directly, we construct classifiers end-to-end without accessing raw data, enabling effective recovery of class semantics. Experiments demonstrate significantly higher downstream classification accuracy than local DP baselines, while maintaining practical deployability. Our core contributions are: (i) the first Shuffle DP KDE scheme matching the accuracy of central DP; and (ii) an end-to-end framework for private density estimation and classification learning.

Technology Category

Application Category

📝 Abstract

We study a setting of collecting and learning from private data distributed across end users. In the shuffled model of differential privacy, the end users partially protect their data locally before sharing it, and their data is also anonymized during its collection to enhance privacy. This model has recently become a prominent alternative to central DP, which requires full trust in a central data curator, and local DP, where fully local data protection takes a steep toll on downstream accuracy. Our main technical result is a shuffled DP protocol for privately estimating the kernel density function of a distributed dataset, with accuracy essentially matching central DP. We use it to privately learn a classifier from the end user data, by learning a private density function per class. Moreover, we show that the density function itself can recover the semantic content of its class, despite having been learned in the absence of any unprotected data. Our experiments show the favorable downstream performance of our approach, and highlight key downstream considerations and trade-offs in a practical ML deployment of shuffled DP.

Problem

Research questions and friction points this paper is trying to address.

Shuffled Differential Privacy model

Private kernel density estimation

Learning classifiers from end user data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shuffled differential privacy model

Private kernel density estimation

Classifier learning from user data

🔎 Similar Papers

Differentially Private Block-wise Gradient Shuffle for Deep Learning