Differentially Private Release of Hierarchical Origin/Destination Data with a TopDown Approach

📅 2024-12-12

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Releasing differentially private origin-destination (O/D) mobility data with geographic hierarchical structure poses challenges in balancing accuracy for wide-range aggregate queries and consistency under “scale-back” operations. Method: We propose a top-down framework that jointly optimizes utility and consistency. We introduce a novel constrained optimization algorithm minimizing Chebyshev distance, theoretically guaranteeing an upper bound on maximum absolute error. To ensure integral consistency across hierarchy levels, we design a specialized integer programming solver. The method supports arbitrary privacy units and general hierarchical table structures. Contribution/Results: Experiments on real-world and synthetic O/D datasets demonstrate that our approach significantly reduces false positive rates while preserving high data utility. This improvement enhances practitioners’ trust in and willingness to adopt differentially private datasets, thereby bridging the gap between theoretical privacy guarantees and practical deployment requirements.

Technology Category

Application Category

📝 Abstract

This paper presents a novel method for generating differentially private tabular datasets for hierarchical data, specifically focusing on origin-destination (O/D) trips. The approach builds upon the TopDown algorithm, a constraint-based mechanism developed by the U.S. Census to incorporate invariant queries into tabular data. O/D hierarchical data refers to datasets representing trips between geographical areas organized in a hierarchical structure (e.g., region $ ightarrow$ province $ ightarrow$ city). The proposed method is designed to improve the accuracy of queries covering broader geographical areas, which are derived through aggregation. This feature provides a"zoom-in"effect on the dataset, ensuring that when zoomed back out, the overall picture is preserved. Furthermore, the approach aims to reduce false positive detection. These characteristics can strengthen practitioners' and decision-makers' confidence in adopting differential privacy datasets. The main technical contribution of this paper includes a novel TopDown algorithm that employs constrained optimization with Chebyshev distance minimization, with theoretical guarantees on the maximum absolute error. Additionally, we propose a new integer optimization algorithm that significantly reduces the incidence of false positives. The effectiveness of the proposed approach is validated using real-world and synthetic O/D datasets, demonstrating its ability to generate private data with high utility and a reduced number of false positives. Our experiments focus on O/D datasets with a single trip as a unit of privacy: nevertheless, the proposed approach supports other units of privacy and also applies to any tabular data with a hierarchical structure.

Problem

Research questions and friction points this paper is trying to address.

Generates differentially private hierarchical origin-destination trip data.

Improves query accuracy for aggregated geographical area data.

Reduces false positives in differentially private datasets.

Innovation

Methods, ideas, or system contributions that make the work stand out.

TopDown algorithm with Chebyshev distance minimization

Integer optimization to reduce false positives

Hierarchical data privacy with high utility

🔎 Similar Papers

FastLloyd: Federated, Accurate, Secure, and Tunable k-Means Clustering with Differential Privacy