BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Federated learning (FL) faces a novel security threat—machine unlearning mechanisms are vulnerable to backdoor attacks. Method: We propose the first backdoor attack specifically targeting federated unlearning, wherein an adversary injects backdoors during local client training by submitting malicious samples disguised as legitimate unlearning requests; the selective unlearning process then triggers covert behavioral shifts in the global model. Crucially, the attack requires no modification to the aggregation protocol and operates within standard FL frameworks (e.g., FedAvg, FedProx) and common unlearning strategies (e.g., EFK, SISA). Contribution/Results: Experiments demonstrate robust attack efficacy across diverse FL settings and unlearning schemes, significantly degrading primary-task accuracy while activating highly stealthy backdoors post-unlearning. This work uncovers a fundamental security flaw in current federated unlearning designs, introduces the first paradigm treating unlearning operations as attack vectors, and provides critical security insights and benchmarking tools for trustworthy federated unlearning.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) has been widely adopted as a decentralized training paradigm that enables multiple clients to collaboratively learn a shared model without exposing their local data. As concerns over data privacy and regulatory compliance grow, machine unlearning, which aims to remove the influence of specific data from trained models, has become increasingly important in the federated setting to meet legal, ethical, or user-driven demands. However, integrating unlearning into FL introduces new challenges and raises largely unexplored security risks. In particular, adversaries may exploit the unlearning process to compromise the integrity of the global model. In this paper, we present the first backdoor attack in the context of federated unlearning, demonstrating that an adversary can inject backdoors into the global model through seemingly legitimate unlearning requests. Specifically, we propose BadFU, an attack strategy where a malicious client uses both backdoor and camouflage samples to train the global model normally during the federated training process. Once the client requests unlearning of the camouflage samples, the global model transitions into a backdoored state. Extensive experiments under various FL frameworks and unlearning strategies validate the effectiveness of BadFU, revealing a critical vulnerability in current federated unlearning practices and underscoring the urgent need for more secure and robust federated unlearning mechanisms.

Problem

Research questions and friction points this paper is trying to address.

Backdoor attack through adversarial unlearning in federated learning

Malicious client exploits unlearning requests to inject backdoors

Global model compromised via camouflage sample removal strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial unlearning requests inject backdoors

Malicious client uses camouflage samples training

Global model transitions to backdoored state

🔎 Similar Papers

Kick Bad Guys Out! Conditionally Activated Anomaly Detection in Federated Learning with Zero-Knowledge Proof Verification