The 2010 Census Confidentiality Protections Failed, Here's How and Why

📅 2023-12-01
🏛️ Social Science Research Network
📈 Citations: 12
Influential: 5
📄 PDF
🤖 AI Summary
This work challenges the conventional privacy assumption that “aggregation ensures safety,” systematically assessing re-identification risks in the 2010 U.S. Census summary tables. Using statistical reconstruction analysis, re-identification modeling, and uniqueness validation, the authors demonstrate that just 34 publicly available aggregate tables suffice to reconstruct block-level individual records—namely gender, age, race, and ethnicity—with high fidelity: achieving perfect reconstruction for 70% of census blocks and inferring sensitive attributes for 3.4 million “non-modal” individuals at 95% accuracy, thereby exposing the privacy of approximately 97 million people. This study provides the first empirical evidence that aggregated data can be precisely reverse-engineered, with a maximum univariate reconstruction error of only 20.1%. It reveals a fundamental flaw in current statistical disclosure control frameworks and overturns the prevailing belief that microdata are inherently more vulnerable to leakage than aggregate data—offering critical implications for statistical confidentiality theory and practice.
📝 Abstract
Using only 34 published tables, we reconstruct five variables (census block, sex, age, race, and ethnicity) in the confidential 2010 Census person records. Using the 38-bin age variable tabulated at the census block level, at most 20.1% of reconstructed records can differ from their confidential source on even a single value for these five variables. Using only published data, an attacker can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. The tabular publications in Summary File 1 thus have prohibited disclosure risk similar to the unreleased confidential microdata. Reidentification studies confirm that an attacker can, within blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with nonmodal characteristics) with 95% accuracy, the same precision as the confidential data achieve and far greater than statistical baselines. The flaw in the 2010 Census framework was the assumption that aggregation prevented accurate microdata reconstruction, justifying weaker disclosure limitation methods than were applied to 2010 Census public microdata. The framework used for 2020 Census publications defends against attacks that are based on reconstruction, as we also demonstrate here. Finally, we show that alternatives to the 2020 Census Disclosure Avoidance System with similar accuracy (enhanced swapping) also fail to protect confidentiality, and those that partially defend against reconstruction attacks (incomplete suppression implementations) destroy the primary statutory use case: data for redistricting all legislatures in the country in compliance with the 1965 Voting Rights Act.
Problem

Research questions and friction points this paper is trying to address.

Reveals vulnerability in 2010 Census tabular data confidentiality
Demonstrates microdata reconstruction from published census statistics
Assesses 2020 Census defenses against reconstruction attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructed microdata from census tables
Verified perfect reconstruction for 70% blocks
Assessed 2020 Census defense against reconstruction
🔎 Similar Papers
No similar papers found.
John M. Abowd
John M. Abowd
Cornell University, U.S. Census Bureau (retired)
T
Tamara Adams
U.S. Census Bureau
R
Robert Ashmead
U.S. Census Bureau
David Darais
David Darais
Galois, Inc.
Programming LanguagesProgram AnalysisMechanized Proofs
S
Sourya Dey
Galois, Inc.
S
S. Garfinkel
BasisTech, formerly U.S. Census Bureau
N
N. Goldschlag
U.S. Census Bureau
Daniel Kifer
Daniel Kifer
Penn State University
privacymachine learning
P
Philip Leclerc
U.S. Census Bureau
Ethan Lew
Ethan Lew
P-1 AI
machine learningcontrol theorycyber-physical systems
Scott Moore
Scott Moore
Galois, Inc.
R
Rolando A. Rodr'iguez
U.S. Census Bureau
Ramy N. Tadros
Ramy N. Tadros
Galois, Inc.
L
L. Vilhuber
Cornell University, formerly U.S. Census Bureau