🤖 AI Summary
This paper addresses the problem of accurately estimating the distribution of sensitive attributes (e.g., age, location) under Local Differential Privacy (LDP). While Iterative Bayesian Updating (IBU) is a widely adopted method for this task, its statistical consistency has long lacked rigorous theoretical justification. Methodologically, we unify the modeling of major LDP mechanisms—including Geometric, Laplace, Exponential, k-Randomized Response (k-RR), and RAPPOR—within a common framework and systematically analyze IBU’s estimation performance across them. Our key contribution is the first rigorous proof of IBU’s consistency, grounded in maximum likelihood estimation theory, along with a general technical framework extendable to infinite alphabets. Experimental results demonstrate that IBU consistently outperforms existing distribution estimation algorithms in both finite- and infinite-domain settings, while providing robust convergence guarantees.
📝 Abstract
For many social, scientific, and commercial purposes, it is often important to estimate the distribution of the users' data regarding a sensitive attribute, e.g., their ages, locations, etc. To allow this estimation while protecting the users' privacy, every user applies a local privacy protection mechanism that releases a noisy (sanitized) version of their original datum to the data collector; then the original distribution is estimated using one of the known methods, such as the matrix inversion (INV), RAPPOR's estimator, and the iterative Bayesian update (IBU). Unlike the other estimators, the consistency of IBU, i.e., the convergence of its estimate to the real distribution as the amount of noisy data grows, has been either ignored or incorrectly proved in the literature. In this article, we use the fact that IBU is a maximum likelihood estimator to prove that IBU is consistent. We also show, through experiments on real datasets, that IBU significantly outperforms the other methods when the users' data are sanitized by geometric, Laplace, and exponential mechanisms, whereas it is comparable to the other methods in the case of the k-RR and RAPPOR mechanisms. Finally, we consider the case when the alphabet of the sensitive data is infinite, and we show a technique that allows IBU to operate in this case too.