🤖 AI Summary
Local differential privacy (LDP) in distributed settings is vulnerable to data poisoning attacks, leading to biased frequency estimates. To address this, we propose the first lightweight, dependency-free, two-stage defense framework. In Stage I, we jointly leverage statistical anomaly detection and implicit pattern clustering to identify malicious users and uncover underlying attack patterns—without requiring prior knowledge of attacks or auxiliary data. In Stage II, we design an LDP-aware post-processing utility recovery mechanism, revealing a novel robustness criterion for LDP protocols. The framework incurs negligible computational overhead. Extensive experiments demonstrate that our method improves malicious user detection accuracy by over 35% and reduces frequency estimation utility loss by more than 50% across diverse scenarios, significantly outperforming state-of-the-art defenses.
📝 Abstract
The distributed nature of local differential privacy (LDP) invites data poisoning attacks and poses unforeseen threats to the underlying LDP-supported applications. In this paper, we propose a comprehensive mitigation framework for popular frequency estimation, which contains a suite of novel defenses, including malicious user detection, attack pattern recognition, and damaged utility recovery. In addition to existing attacks, we explore new adaptive adversarial activities for our mitigation design. For detection, we present a new method to precisely identify bogus reports and thus LDP aggregation can be performed over the ``clean'' data. When the attack behavior becomes stealthy and direct filtering out malicious users is difficult, we further propose a detection that can effectively recognize hidden adversarial patterns, thus facilitating the decision-making of service providers. These detection methods require no additional data and attack information and incur minimal computational cost. Our experiment demonstrates their excellent performance and substantial improvement over previous work in various settings. In addition, we conduct an empirical analysis of LDP post-processing for corrupted data recovery and propose a new post-processing method, through which we reveal new insights into protocol recommendations in practice and key design principles for future research.