🤖 AI Summary
This study systematically evaluates the statistical accuracy impacts of two privacy-preserving mechanisms—data swapping (used in U.S. Censuses 1990–2010) and differential privacy (implemented via the 2020 TopDown framework). Methodologically, we reverse-engineer official documentation, court records, and practitioner interviews to open-source the first parameterized implementation of the historical swapping algorithm, enabling controlled comparative experiments with the 2020 TopDown system. Results show that both methods induce comparable overall error magnitudes; however, swapping introduces asymmetric, structured bias—violating the implicit assumption of random noise—and produces identifiable, modelable distortions in core statistics (e.g., population counts, housing units). This work uncovers a previously overlooked systemic bias in legacy swapping, establishes a novel, interpretable modeling paradigm for disclosure-avoidance effects, and provides a reproducible foundation for bias correction, methodological comparison, and evidence-based policy evaluation.
📝 Abstract
To meet its dual burdens of providing useful statistics and ensuring privacy of individual respondents, the US Census Bureau has for decades introduced some form of"noise"into published statistics, initially through a method known as"swapping"(1990--2010), and then, for the first time in 2020, via an algorithm ensuring a form of differential privacy. While the TopDown algorithm used in 2020 has been made public, no implementation of swapping has been released, in part to preserve the confidentiality of respondent data. The Bureau has not published (even a synthetic)"original"dataset and its swapped version, and it has kept secret many details of the swapping methodology deployed. It is therefore difficult to evaluate the effects of swapping, and to compare swapping to other privacy technologies. To address these difficulties we describe and implement a parameterized swapping algorithm based on Census publications and court documents, and informal interviews with Census employees. With this implementation, we characterize the impacts of swapping on a range of statistical quantities of interest. We provide intuition for the types of shifts induced by swapping and compare against techniques that use differential privacy. We find that even when swapping and differential privacy introduce errors of a similar magnitude, the direction in which statistics are biased need not be the same across the two techniques. More broadly, our implementation provides researchers with the tools to analyze and potentially correct for the impacts of disclosure avoidance systems on the quantities they study.