🤖 AI Summary
This paper addresses the problem of releasing hierarchical count queries over multidimensional categorical data (with $d$-dimensional categorical attributes) under differential privacy. Methodologically, it generalizes the classic TopDown mechanism to arbitrary categorical domains by integrating tree-based structural decomposition, differentially private noise injection, and top-down consistency calibration—building upon the 2020 U.S. Census TopDown framework while providing theoretically provable error bounds. Key contributions are: (1) the first extension of TopDown beyond origin-destination mobility data to general categorical data of arbitrary dimensionality; (2) end-to-end $varepsilon$-differential privacy guarantee for all hierarchical count queries, with strict upper bounds on the absolute error per query; and (3) a practical yet theoretically rigorous framework that significantly outperforms the naive Laplace mechanism in accuracy while preserving privacy. The approach bridges theoretical soundness and real-world applicability for privacy-preserving statistical release of high-dimensional categorical datasets.
📝 Abstract
This paper extends $ exttt{InfTDA}$, a mechanism proposed in (Boninsegna, Silvestri, PETS 2025) for mobility datasets with origin and destination trips, in a general setting. The algorithm presented in this paper works for any dataset of $d$ categorical features and produces a differentially private synthetic dataset that answers all hierarchical queries, a special case of marginals, each with bounded maximum absolute error. The algorithm builds upon the TopDown mechanism developed for the 2020 US Census.