🤖 AI Summary
This work addresses the vulnerability of federated learning to data poisoning attacks, noting that existing generative approaches—such as GANs—often produce malicious samples that exhibit statistical inconsistencies, rendering them easily detectable. To overcome this limitation, the paper introduces PCDM, the first framework to leverage conditional diffusion models for federated poisoning attacks. PCDM enables fine-grained control over locally crafted malicious data through tunable poisoning vectors and incorporates a skip-diffusion strategy to enhance generation efficiency. Extensive experiments across five benchmark datasets—MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and VRAI—demonstrate that PCDM significantly degrades global model accuracy while achieving superior stealthiness, thereby evading detection by state-of-the-art Byzantine-robust aggregation mechanisms more effectively than prior methods.
📝 Abstract
Federated learning (FL) is vulnerable to data poisoning attacks due to its distributed nature. Although recent GAN-based data poisoning methods have indicated the potential of using generative AI to generate seemingly legitimate poisoned data, the inherent consistency of GAN outputs can still reveal a sign of data poisoning. In this paper, we propose a diffusion-based data poisoning framework against FL systems, which leverages a Poisoning-Oriented Conditional Diffusion Model (PCDM) to enable fine-grained control over the local generation of poisoned data while ensuring both attack effectiveness and stealthiness. Our PCDM incorporates an adjustable poisoning vector within the global context to precisely control the generation of poisoned data, with theoretical guarantees on attack performance. Furthermore, it employs a novel jumping diffusion strategy for lightweight and efficient poisoned data generation. We conduct the most systematic and broad experimental evaluation for FL poisoning attacks against various defenses, including advanced Byzantine robust aggregation mechanisms, on four open datasets: MNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, and a real-world wireless-specific dataset VRAI. Our results demonstrate that PCDM is less likely to exhibit statistical anomalies compared with the state-of-the-art methods while more effectively degrading global FL performance, which poses a significant risk to data security in FL.