Statistics-Friendly Confidentiality Protection for Establishment Data, with Applications to the QCEW

πŸ“… 2025-09-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Commercial datasets exhibit severe skewness and contain critical outliers; conventional disclosure control and differentially private mechanisms designed for individual-level data fail to simultaneously ensure confidentiality and statistical utility. Method: We propose a novel privacy framework tailored for statistical usability, grounded in Gaussian differential privacy. Our approach introduces an interpretable noise injection mechanism that rigorously preserves extreme values while enabling high-fidelity reconstruction of confidential microdata from noisy aggregate queries. Contribution/Results: Evaluated on real-world economic census data (e.g., QCEW), our method substantially outperforms existing techniques: it satisfies formal privacy guarantees (e.g., Gaussian DP), maintains high query accuracy, and supports policy-relevant analyses. The framework bridges theoretical rigor with practical deployability, establishing a new paradigm for releasing sensitive microenterprise data.

Technology Category

Application Category

πŸ“ Abstract
Confidentiality for business data is an understudied area of disclosure avoidance, where legacy methods struggle to provide acceptable results. Modern formal privacy techniques designed for person-level data do not provide suitable confidentiality/utility trade-offs due to the highly skewed nature of business data and because extreme outlier records are often important contributors to query answers. In this paper, inspired by Gaussian Differential Privacy, we propose a novel confidentiality framework for business data with a focus on interpretability for policy makers. We propose two query-answering mechanisms and analyze new challenges that arise when noisy query answers are converted into confidentiality-preserving microdata. We evaluate our mechanisms on confidential Quarterly Census of Employment and Wages (QCEW) microdata and a public substitute dataset.
Problem

Research questions and friction points this paper is trying to address.

Protecting confidentiality for highly skewed business data
Developing interpretable privacy framework for policy makers
Addressing outlier sensitivity in formal privacy techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Differential Privacy inspired framework
Two query-answering mechanisms for business data
Confidentiality-preserving microdata conversion techniques
πŸ”Ž Similar Papers
No similar papers found.
K
Kaitlyn Webb
Penn State University
P
Prottay Protivash
Penn State University
J
John Durrell
Penn State University
D
Daniell Toth
Bureau of Labor Statistics
A
Aleksandra Slavković
Penn State University
Daniel Kifer
Daniel Kifer
Penn State University
privacymachine learning