Estimating the True Distribution of Data Collected with Randomized Response

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of the standard debiasing approach for the Randomized Response (RR) mechanism under local differential privacy, which often yields negative estimates that lack interpretability as histogram probabilities. The paper presents the first closed-form maximum likelihood estimator (MLE) for the true underlying distribution under RR, circumventing the high computational cost of iterative Bayesian updating (IBU) algorithms. The proposed estimator is theoretically elegant, computationally efficient, and guarantees non-negative, accurate reconstruction of the original histogram. Empirical evaluations demonstrate that this analytical solution significantly outperforms existing correction methods in both estimation accuracy and computational efficiency, offering practitioners a reliable and scalable tool for privacy-preserving frequency estimation.

Technology Category

Application Category

📝 Abstract
Randomized Response (RR) is a protocol designed to collect and analyze categorical data with local differential privacy guarantees. It has been used as a building block of mechanisms deployed by Big tech companies to collect app or web users'data. Each user reports an automatic random alteration of their true value to the analytics server, which then estimates the histogram of the true unseen values of all users using a debiasing rule to compensate for the added randomness. A known issue is that the standard debiasing rule can yield a vector with negative values (which can not be interpreted as a histogram), and there is no consensus on the best fix. An elegant but slow solution is the Iterative Bayesian Update algorithm (IBU), which converges to the Maximum Likelihood Estimate (MLE) as the number of iterations goes to infinity. This paper bypasses IBU by providing a simple formula for the exact MLE of RR and compares it with other estimation methods experimentally to help practitioners decide which one to use.
Problem

Research questions and friction points this paper is trying to address.

Randomized Response
Local Differential Privacy
Histogram Estimation
Negative Estimates
Maximum Likelihood Estimate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Randomized Response
Maximum Likelihood Estimation
Local Differential Privacy
Histogram Estimation
Closed-form Solution
🔎 Similar Papers
No similar papers found.