🤖 AI Summary
This paper addresses subset counting queries over set-valued data under Local Differential Privacy (LDP), where conventional value-perturbation approaches suffer from limited statistical utility.
Method: We propose a novel index randomization paradigm—instead of perturbing raw set elements, we randomize their indices in an encoded space. Specifically, we introduce the CRIAD framework, which integrates multi-virtual-item encoding, multi-sampling aggregation, and multi-group error suppression to enhance estimation accuracy while strictly preserving LDP.
Contribution/Results: We formally prove that CRIAD satisfies ε-LDP. Extensive experiments demonstrate that it consistently outperforms state-of-the-art value-perturbation mechanisms across diverse domain sizes and privacy budgets (ε), achieving superior query accuracy, scalability, and flexibility—all without compromising privacy guarantees.
📝 Abstract
Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy. Existing perturbation mechanisms typically require perturbing the original values to ensure acceptable privacy, which inevitably results in value distortion and utility deterioration. In this work, we propose an alternative approach -- instead of perturbing values, we apply randomization to indexes of values while ensuring rigorous LDP guarantees. Inspired by the deniability of randomized indexes, we present CRIAD for answering subset counting queries on set-value data. By integrating a multi-dummy, multi-sample, and multi-group strategy, CRIAD serves as a fully scalable solution that offers flexibility across various privacy requirements and domain sizes, and achieves more accurate query results than any existing methods. Through comprehensive theoretical analysis and extensive experimental evaluations, we validate the effectiveness of CRIAD and demonstrate its superiority over traditional value-perturbation mechanisms.