Denoising the US Census: Succinct Block Hierarchical Regression

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes BlueDown, a novel method to enhance the accuracy and consistency of county- and block-level aggregate data from the U.S. decennial census under differential privacy constraints. Leveraging the geographic hierarchy of census units, BlueDown introduces a linear-time generalized least squares regression algorithm that integrates structural consistency constraints with symmetry-driven, streamlined linear algebra operations. The approach strictly adheres to both differential privacy guarantees and data consistency requirements while significantly outperforming the current TopDown algorithm. BlueDown overcomes computational bottlenecks inherent in large-scale census post-processing and achieves higher accuracy on key demographic metrics without compromising the prescribed privacy budget.

Technology Category

Application Category

📝 Abstract
The US Census Bureau Disclosure Avoidance System (DAS) balances confidentiality and utility requirements for the decennial US Census (Abowd et al., 2022). The DAS was used in the 2020 Census to produce demographic datasets critically used for legislative apportionment and redistricting, federal and state funding allocation, municipal and infrastructure planning, and scientific research. At the heart of DAS is TopDown, a heuristic post-processing method that combines billions of private noisy measurements across six geographic levels in order to produce new estimates that are consistent, more accurate, and satisfy certain structural constraints on the data. In this work, we introduce BlueDown, a new post-processing method that produces more accurate, consistent estimates while satisfying the same privacy guarantees and structural constraints. We obtain especially large accuracy improvements for aggregates at the county and tract levels on evaluation metrics proposed by the US Census Bureau. From a technical perspective, we develop a new algorithm for generalized least-squares regression that leverages the hierarchical structure of the measurements and that is statistically optimal among linear unbiased estimators. This reduces the computational dependence on the number of geographic regions measured from matrix multiplication time, which would be infeasible for census-scale data, to linear time. We incorporate the additional structural constraints by combining this regression algorithm with an optimization routine that extends TDA to support correlated measurements. We further improve the efficiency of our algorithm using succinct linear-algebraic operations that exploit symmetries in the structure of the measurements and constraints. We believe our hierarchical regression and succinct operations to be of independent interest.
Problem

Research questions and friction points this paper is trying to address.

Census Denoising
Disclosure Avoidance
Hierarchical Regression
Data Accuracy
Structural Constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical regression
succinct linear algebra
generalized least squares
privacy-preserving data processing
census data post-processing
🔎 Similar Papers
No similar papers found.