Large-Sample Bayesian Approximations for Privatized Data

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study addresses the challenge that noise introduced by differential privacy severely impedes effective statistical inference on large-scale privacy-preserving data, as existing approaches often rely on strong parametric assumptions or lack scalability. We propose a two-step approximate Bayesian inference framework: first imputing the differentially private data, then sampling from the non-private posterior distribution. The method achieves asymptotic validity under weak assumptions in large samples while simultaneously satisfying conservative frequentist properties, thereby combining Bayesian flexibility with frequentist reliability. Building upon and refining the approach of Guha and Reiter (2025), we demonstrate the method’s effectiveness and practical utility through simulation studies and an analysis of homeownership using the 2022 American Community Survey.

📝 Abstract

The increased use of differential privacy (DP) has allowed the sharing of large amounts of data while reducing the risk of disclosure of sensitive information at the individual level. However, the noise introduced by DP methods makes performing statistical inference more challenging. While various methods have been proposed to address different inferential tasks, they often require strong parametric assumptions and/or do not scale well with sample sizes (e.g. U.S. Census products). In response to these limitations, we propose an approximate Bayesian method to analyze privatized data products, which uses a two-step approach of imputing the confidential data and then sampling from the non-private posterior, and which is inspired by the method of Guha and Reiter (2025). We prove that this approximate sampler is asymptotically valid under mild assumptions. While this approach is motivated by Bayesian theory, we show through simulations that it provides conservative frequentist properties as well. We demonstrate the utility of our method by applying it in simulated settings as well as for an analysis on the drivers of homeownership via the 2022 American Community Survey.

Problem

Research questions and friction points this paper is trying to address.

differential privacy

statistical inference

large-sample

privatized data

Bayesian approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

differential privacy

approximate Bayesian inference

data imputation