š¤ AI Summary
In large-scale datasets where individual records may belong to multiple users, enforcing user-level differential privacy (ULDP) constraints on record contributions poses significant computational challenges. Existing sequential algorithms suffer from quadratic time complexity, hindering scalability.
Method: This paper proposes the first distributed parallel algorithm for ULDP enforcement, based on hypergraph modeling. It represents the user-record ownership relationship as a hypergraph and reformulates the contribution constraint problem as a parallel negotiation and consensus decision process among record owners, enabling collaborative determination of whether a record is included in the dataset.
Contribution/Results: The method strictly satisfies ULDP while reducing time complexity from O(n²) in state-of-the-art sequential approaches to near-linear, with strong scalability. Experiments on real-world datasets with hundreds of millions of records demonstrate minute-scale processing times, while preserving high data utility. This work provides the first deployable, scalable solution for user-level private data curation in modern large-scale systems.
š Abstract
In modern datasets, where single records can have multiple owners, enforcing user-level differential privacy requires capping each user's total contribution. This "contribution bounding" becomes a significant combinatorial challenge. Existing sequential algorithms for this task are computationally intensive and do not scale to the massive datasets prevalent today. To address this scalability bottleneck, we propose a novel and efficient distributed algorithm. Our approach models the complex ownership structure as a hypergraph, where users are vertices and records are hyperedges. The algorithm proceeds in rounds, allowing users to propose records in parallel. A record is added to the final dataset only if all its owners unanimously agree, thereby ensuring that no user's predefined contribution limit is violated. This method aims to maximize the size of the resulting dataset for high utility while providing a practical, scalable solution for implementing user-level privacy in large, real-world systems.