SafeTab-H: Disclosure Avoidance for the 2020 Census Detailed Demographic and Housing Characteristics File B (Detailed DHC-B)

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the privacy-preserving release of the 2020 U.S. Census Detailed Demographic and Housing Characteristics File B (DHC-B), a high-dimensional, nationally representative dataset with complex hierarchical geography and household-level attributes (e.g., household type, tenure status, householder race/ethnicity/tribal affiliation). Method: We design and deploy the first zero-concentrated differential privacy (zCDP) system for national census data release. Our approach introduces a discrete Gaussian noise injection mechanism tailored to multi-level geographic nesting and fine-grained household statistics, providing rigorous zCDP guarantees. Built atop the Tumult Analytics privacy computing library, the system implements a scalable tabulation pipeline with theoretically bounded error. Contribution/Results: The system achieves a superior utility-privacy trade-off and has been adopted to produce the official DHC-B data products—the first successful large-scale deployment of zCDP for detailed national census data release.

Technology Category

Application Category

📝 Abstract

This article describes SafeTab-H, a disclosure avoidance algorithm applied to the release of the U.S. Census Bureau's Detailed Demographic and Housing Characteristics File B (Detailed DHC-B) as part of the 2020 Census. The tabulations contain household statistics about household type and tenure iterated by the householder's detailed race, ethnicity, or American Indian and Alaska Native tribe and village at varying levels of geography. We describe the algorithmic strategy which is based on adding noise from a discrete Gaussian distribution and show that the algorithm satisfies a well-studied variant of differential privacy, called zero-concentrated differential privacy. We discuss how the implementation of the SafeTab-H codebase relies on the Tumult Analytics privacy library. We also describe the theoretical expected error properties of the algorithm and explore various aspects of its parameter tuning.

Problem

Research questions and friction points this paper is trying to address.

Develops SafeTab-H for Census data privacy

Ensures differential privacy in demographic statistics

Optimizes algorithm parameters for accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses discrete Gaussian noise addition

Implements zero-concentrated differential privacy

Relies on Tumult Analytics privacy library

🔎 Similar Papers

Quantifying Privacy Risks of Public Statistics to Residents of Subsidized Housing