Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

📅 2024-08-16

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

In federated learning, malicious clients can inject backdoors during local training—exploiting the opacity of client-side updates—to compromise model security. To address this, we propose FLAIN, a defense method that identifies critical neurons exhibiting high activation on backdoor inputs but low activation on clean data, using auxiliary datasets. FLAIN dynamically inverts the sign of weight updates for these neurons during aggregation to suppress backdoor activation. This work introduces the first neuron-activation-driven adaptive weight inversion mechanism, requiring no model architecture modification, no access to client data, and remaining effective under non-IID data distributions and high fractions of malicious clients. Extensive experiments demonstrate that FLAIN reduces success rates of diverse backdoor attacks to below 5%, while incurring less than 0.5% degradation in clean accuracy—outperforming state-of-the-art defenses.

Technology Category

Application Category

📝 Abstract

Federated learning enables multiple clients to collaboratively train machine learning models under the overall planning of the server while adhering to privacy requirements. However, the server cannot directly oversee the local training process, creating an opportunity for malicious clients to introduce backdoors. Existing research shows that backdoor attacks activate specific neurons in the compromised model, which remain dormant when processing clean data. Leveraging this insight, we propose a method called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to defend against backdoor attacks in federated learning. Specifically, after completing global training, we employ an auxiliary dataset to identify low-activation input neurons and flip the associated weight updates. We incrementally raise the threshold for low-activation inputs and flip the weight updates iteratively, until the performance degradation on the auxiliary data becomes unacceptable. Extensive experiments validate that our method can effectively reduce the success rate of backdoor attacks to a low level in various attack scenarios including those with non-IID data distribution or high MCRs, causing only minimal performance degradation on clean data.

Problem

Research questions and friction points this paper is trying to address.

Mitigating backdoor attacks in federated learning via neuron analysis

Identifying low-activation neurons to flip malicious weight updates

Maintaining clean data performance while reducing backdoor success rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flipping weight updates of low-activation neurons

Using auxiliary dataset to identify malicious neurons

Iteratively adjusting activation threshold for defense

🔎 Similar Papers

No similar papers found.