Frequency Estimation of Correlated Multi-attribute Data under Local Differential Privacy

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing local differential privacy (LDP) mechanisms for high-dimensional multi-attribute data suffer from low utility in frequency estimation, primarily due to independent attribute perturbation or uniform privacy budget allocation, which introduces excessive noise and sharply degrades accuracy. To address this, we propose Corr-RR, the first LDP frequency estimation framework that explicitly models and leverages statistical correlations among attributes. Corr-RR adopts a two-stage design: first, it lightweightly estimates pairwise attribute correlations via standard LDP protocols; second, it perturbs only one representative attribute and reconstructs the others using the estimated correlations. We provide rigorous theoretical proof that Corr-RR satisfies ε-LDP. Extensive experiments on both synthetic and real-world high-dimensional datasets demonstrate that Corr-RR significantly outperforms state-of-the-art methods, reducing frequency estimation error by 30%–65% in strongly correlated settings.

Technology Category

Application Category

📝 Abstract
Large-scale data collection, from national censuses to IoT-enabled smart homes, routinely gathers dozens of attributes per individual. These multi-attribute datasets are vital for analytics but pose significant privacy risks. Local Differential Privacy (LDP) is a powerful tool to protect user data privacy by allowing users to locally perturb their records before releasing to an untrusted data aggregator. However, existing LDP mechanisms either split the privacy budget across all attributes or treat each attribute independently, ignoring natural inter-attribute correlations. This leads to excessive noise or fragmented budgets, resulting in significant utility loss, particularly in high-dimensional settings. To overcome these limitations, we propose Correlated Randomized Response (Corr-RR), a novel LDP mechanism that leverages correlations among attributes to substantially improve utility while maintaining rigorous LDP guarantees. Corr-RR allocates the full privacy budget to perturb a single, randomly selected attribute and reconstructs the remaining attributes using estimated interattribute dependencies, without incurring additional privacy cost. To enable this, Corr-RR operates in two phases: (1) a subset of users apply standard LDP mechanisms to estimate correlations, and (2) each remaining user perturbs one attribute and infers the others using the learned correlations. We theoretically prove that Corr-RR satisfies $ε$-LDP, and extensive experiments on synthetic and real-world datasets demonstrate that Corr-RR consistently outperforms state-of-the-art LDP mechanisms, particularly in scenarios with many attributes and strong inter-attribute correlations.
Problem

Research questions and friction points this paper is trying to address.

Estimating frequency of correlated multi-attribute data privately
Improving utility in Local Differential Privacy mechanisms
Handling high-dimensional data with inter-attribute correlations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages attribute correlations for better utility
Uses full privacy budget on one attribute
Reconstructs other attributes via learned dependencies
🔎 Similar Papers
No similar papers found.