Nebula: Efficient, Private and Accurate Histogram Estimation

📅 2024-09-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This paper addresses histogram estimation in distributed settings without a trusted third party, proposing a client-autonomous participation scheme satisfying rigorous differential privacy (DP). Methodologically, it integrates local subsampling, lightweight hashing, and Bloom filter encoding to design a threshold-driven aggregation mechanism: only data items exceeding a dynamically adjusted threshold are uploaded, eliminating reliance on trusted servers, secure multi-party computation, or trusted hardware. Key contributions include: (i) the first rigorous theoretical upper bound on client-side privacy leakage under realistic trust assumptions; (ii) significantly improved utility over standard local differential privacy (LDP), reducing estimation error by over 88% on US Census data; (iii) low computational overhead—0.0058 seconds per client—and minimal communication cost—0.0027 MB per client; and (iv) support for multidimensional extension and publicly available open-source implementation.

Technology Category

Application Category

📝 Abstract

We present Nebula, a system for differential private histogram estimation of data distributed among clients. Nebula enables clients to locally subsample and encode their data such that an untrusted server learns only data values that meet an aggregation threshold to satisfy differential privacy guarantees. Compared with other private histogram estimation systems, Nebula uniquely achieves all of the following: extit{i)} a strict upper bound on privacy leakage; extit{ii)} client privacy under realistic trust assumptions; extit{iii)} significantly better utility compared to standard local differential privacy systems; and extit{iv)} avoiding trusted third-parties, multi-party computation, or trusted hardware. We provide both a formal evaluation of Nebula's privacy, utility and efficiency guarantees, along with an empirical evaluation on three real-world datasets. We demonstrate that clients can encode and upload their data efficiently (only 0.0058 seconds running time and 0.0027 MB data communication) and privately (strong differential privacy guarantees $varepsilon=1$). On the United States Census dataset, the Nebula's untrusted aggregation server estimates histograms with above 88% better utility than the existing local deployment of differential privacy. Additionally, we describe a variant that allows clients to submit multi-dimensional data, with similar privacy, utility, and performance. Finally, we provide an open source implementation of Nebula.

Problem

Research questions and friction points this paper is trying to address.

Efficient private histogram estimation on distributed client data

Client-controlled participation with untrusted server privacy guarantees

High utility without trusted third-parties or complex computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentially private histogram estimation system

Clients locally encode data for privacy

No trusted third-parties or hardware needed

🔎 Similar Papers

No similar papers found.