🤖 AI Summary
Releasing differentially private origin-destination (O/D) mobility data with geographic hierarchical structure poses challenges in balancing accuracy for wide-range aggregate queries and consistency under “scale-back” operations.
Method: We propose a top-down framework that jointly optimizes utility and consistency. We introduce a novel constrained optimization algorithm minimizing Chebyshev distance, theoretically guaranteeing an upper bound on maximum absolute error. To ensure integral consistency across hierarchy levels, we design a specialized integer programming solver. The method supports arbitrary privacy units and general hierarchical table structures.
Contribution/Results: Experiments on real-world and synthetic O/D datasets demonstrate that our approach significantly reduces false positive rates while preserving high data utility. This improvement enhances practitioners’ trust in and willingness to adopt differentially private datasets, thereby bridging the gap between theoretical privacy guarantees and practical deployment requirements.
📝 Abstract
This paper presents a novel method for generating differentially private tabular datasets for hierarchical data, specifically focusing on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism developed by the U.S. Census to incorporate invariant queries into tabular data. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., region $
ightarrow$ province $
ightarrow$ city). The proposed method is designed to improve the accuracy of queries covering broader geographical areas, which are derived through aggregation. This feature provides a"zoom-in"effect on the dataset, ensuring that when zoomed back out, the overall picture is preserved. Furthermore, the approach aims to reduce false positive detection. These characteristics can strengthen practitioners' and decision-makers' confidence in adopting differential privacy datasets. The main technical contribution of this paper includes a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees on the maximum absolute error. Additionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effectiveness of the proposed approach is validated using real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. Our experiments focus on O/D datasets with a single trip as a unit of privacy: nevertheless, the proposed approach supports other units of privacy and also applies to any tabular data with a hierarchical structure.