Locally Private Sampling with Public Data

📅 2024-11-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This paper addresses the setting where users possess both private and public datasets under local differential privacy (LDP). We propose a novel sampling mechanism that jointly leverages the private distribution (p) and the public distribution (q). To our knowledge, this is the first LDP sampling framework achieving minimax optimality for arbitrary (f)-divergences—universally applicable to discrete distributions across all (f)-divergence families. Theoretically, we derive a tight lower bound on sampling error under LDP and prove that our mechanism attains minimax optimality. Empirically, it significantly outperforms state-of-the-art methods in both sampling fidelity and downstream utility. Our core contributions are threefold: (i) a unified, minimax-optimal mechanism design; (ii) a general theoretical analysis valid for any (f)-divergence; and (iii) an efficient sampling paradigm that effectively integrates privacy guarantees with public prior knowledge.

Technology Category

Application Category

📝 Abstract

Local differential privacy (LDP) is increasingly employed in privacy-preserving machine learning to protect user data before sharing it with an untrusted aggregator. Most LDP methods assume that users possess only a single data record, which is a significant limitation since users often gather extensive datasets (e.g., images, text, time-series data) and frequently have access to public datasets. To address this limitation, we propose a locally private sampling framework that leverages both the private and public datasets of each user. Specifically, we assume each user has two distributions: $p$ and $q$ that represent their private dataset and the public dataset, respectively. The objective is to design a mechanism that generates a private sample approximating $p$ while simultaneously preserving $q$. We frame this objective as a minimax optimization problem using $f$-divergence as the utility measure. We fully characterize the minimax optimal mechanisms for general $f$-divergences provided that $p$ and $q$ are discrete distributions. Remarkably, we demonstrate that this optimal mechanism is universal across all $f$-divergences. Experiments validate the effectiveness of our minimax optimal sampler compared to the state-of-the-art locally private sampler.

Problem

Research questions and friction points this paper is trying to address.

Enhancing local differential privacy for multi-record user datasets

Leveraging both private and public data for private sampling

Designing minimax optimal mechanisms for f-divergence utility measures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages private and public datasets simultaneously

Uses minimax optimization with f-divergence

Universal optimal mechanism for all f-divergences

🔎 Similar Papers

Personalized Privacy Amplification via Importance Sampling