Quantifying Privacy Risks of Public Statistics to Residents of Subsidized Housing

📅 2024-07-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
This study identifies a critical privacy threat: publicly available statistical data—such as U.S. Census and HUD datasets—are vulnerable to reconstruction attacks that re-identify unreported occupants in subsidized housing, thereby increasing eviction risks for low-income tenants violating occupancy rules. Methodologically, the authors empirically demonstrate, for the first time, that lightweight data fusion and reconstruction techniques achieve high-accuracy identification of noncompliant households using real 2010 data. Comparative evaluations show that differentially private sanitization substantially mitigates such attacks, whereas conventional random swapping offers negligible protection. Further analysis confirms that the differential privacy mechanism deployed in the 2020 Census significantly reduces reconstruction accuracy. The work establishes a concrete, empirically grounded privacy risk posed by statistical disclosure to marginalized populations and provides actionable, privacy-enhancing strategies—offering both empirical evidence and methodological guidance for secure governance of low-income housing data.

Technology Category

Application Category

📝 Abstract
As the U.S. Census Bureau implements its controversial new disclosure avoidance system, researchers and policymakers debate the necessity of new privacy protections for public statistics. With experiments on both published statistics and synthetic data, we explore a particular privacy concern: respondents in subsidized housing may deliberately not mention unauthorized children and other household members for fear of being evicted. By combining public statistics from the Decennial Census and the Department of Housing and Urban Development, we demonstrate a simple, inexpensive reconstruction attack that could identify subsidized households living in violation of occupancy guidelines in 2010. Experiments on synthetic data suggest that a random swapping mechanism similar to the Census Bureau's 2010 disclosure avoidance measures does not significantly reduce the precision of this attack, while a differentially private mechanism similar to the 2020 disclosure avoidance system does. Our results provide a valuable example for policymakers seeking a trustworthy, accurate census.
Problem

Research questions and friction points this paper is trying to address.

Quantifying privacy risks in public housing statistics
Assessing disclosure risks for subsidized housing residents
Evaluating effectiveness of privacy protection mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining public statistics for reconstruction attack
Testing random swapping mechanism effectiveness
Evaluating differentially private mechanism performance
🔎 Similar Papers
No similar papers found.
R
Ryan Steed
Carnegie Mellon University
D
Diana Qing
University of California, Berkeley
Zhiwei Steven Wu
Zhiwei Steven Wu
Carnegie Mellon University
Machine LearningDifferential PrivacyAlgorithmic FairnessGame TheorySocietal Computing