Noisy Data Meets Privacy: Training Local Models with Post-Processed Remote Queries

📅 2024-05-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing local differential privacy (LDP) inference methods suffer from severe utility degradation in high-privacy-sensitivity domains—such as medical imaging and homeless care—due to noise injection during inference. Method: We propose the first two-stage noise injection framework leveraging LDP’s post-processing invariance: (i) knowledge transfer constructs a privacy-aligned local inference dataset under strict ε-LDP guarantees; (ii) noise-robust label learning and latent-space representation–driven post-hoc noise calibration refine predictions. Contribution/Results: Our approach ensures ethical legitimacy—unlike adversarial model stealing—and admits rigorous theoretical privacy/utility guarantees. Experiments on Fashion-MNIST, SVHN, and PathMNIST demonstrate substantial accuracy improvements: on SVHN with ε = 1.25, accuracy drops by <2%, matching the performance of the ε = 2.0 baseline; moreover, gains increase with stronger noise, confirming superior robustness.

Technology Category

Application Category

📝 Abstract

The adoption of large cloud-based models for inference in privacy-sensitive domains, such as homeless care systems and medical imaging, raises concerns about end-user data privacy. A common solution is adding locally differentially private (LDP) noise to queries before transmission, but this often reduces utility. LDPKiT, which stands for Local Differentially-Private and Utility-Preserving Inference via Knowledge Transfer, addresses the concern by generating a privacy-preserving inference dataset aligned with the private data distribution. This dataset is used to train a reliable local model for inference on sensitive inputs. LDPKiT employs a two-layer noise injection framework that leverages LDP and its post-processing property to create a privacy-protected inference dataset. The first layer ensures privacy, while the second layer helps to recover utility by creating a sufficiently large dataset for subsequent local model extraction using noisy labels returned from a cloud model on privacy-protected noisy inputs. Our experiments on Fashion-MNIST, SVHN and PathMNIST medical datasets demonstrate that LDPKiT effectively improves utility while preserving privacy. Moreover, the benefits of using LDPKiT increase at higher, more privacy-protective noise levels. For instance, on SVHN, LDPKiT achieves similar inference accuracy with $epsilon=1.25$ as it does with $epsilon=2.0$, providing stronger privacy guarantees with less than a 2% drop in accuracy. Furthermore, we perform extensive sensitivity analyses to evaluate the impact of dataset sizes on LDPKiT's effectiveness and systematically analyze the latent space representations to offer a theoretical explanation for its accuracy improvements. Lastly, we qualitatively and quantitatively demonstrate that the type of knowledge distillation performed by LDPKiT is ethical and fundamentally distinct from adversarial model extraction attacks.

Problem

Research questions and friction points this paper is trying to address.

Privacy Protection

Model Accuracy

Local Differential Privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-Preserving

Model Accuracy

Ethical Knowledge Transfer

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding