Scalable contribution bounding to achieve privacy

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

In large-scale datasets where individual records may belong to multiple users, enforcing user-level differential privacy (ULDP) constraints on record contributions poses significant computational challenges. Existing sequential algorithms suffer from quadratic time complexity, hindering scalability. Method: This paper proposes the first distributed parallel algorithm for ULDP enforcement, based on hypergraph modeling. It represents the user-record ownership relationship as a hypergraph and reformulates the contribution constraint problem as a parallel negotiation and consensus decision process among record owners, enabling collaborative determination of whether a record is included in the dataset. Contribution/Results: The method strictly satisfies ULDP while reducing time complexity from O(n²) in state-of-the-art sequential approaches to near-linear, with strong scalability. Experiments on real-world datasets with hundreds of millions of records demonstrate minute-scale processing times, while preserving high data utility. This work provides the first deployable, scalable solution for user-level private data curation in modern large-scale systems.

Technology Category

Application Category

📝 Abstract

In modern datasets, where single records can have multiple owners, enforcing user-level differential privacy requires capping each user's total contribution. This "contribution bounding" becomes a significant combinatorial challenge. Existing sequential algorithms for this task are computationally intensive and do not scale to the massive datasets prevalent today. To address this scalability bottleneck, we propose a novel and efficient distributed algorithm. Our approach models the complex ownership structure as a hypergraph, where users are vertices and records are hyperedges. The algorithm proceeds in rounds, allowing users to propose records in parallel. A record is added to the final dataset only if all its owners unanimously agree, thereby ensuring that no user's predefined contribution limit is violated. This method aims to maximize the size of the resulting dataset for high utility while providing a practical, scalable solution for implementing user-level privacy in large, real-world systems.

Problem

Research questions and friction points this paper is trying to address.

Enforce user-level differential privacy in multi-owner datasets

Address scalability of contribution bounding in massive datasets

Maximize dataset utility while respecting user contribution limits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed algorithm for scalable contribution bounding

Hypergraph modeling of complex ownership structures

Parallel record proposal with unanimous owner agreement

🔎 Similar Papers

On the Volatility of Shapley-Based Contribution Metrics in Federated Learning