🤖 AI Summary
In federated learning, malicious clients can inject backdoors during local training—exploiting the opacity of client-side updates—to compromise model security. To address this, we propose FLAIN, a defense method that identifies critical neurons exhibiting high activation on backdoor inputs but low activation on clean data, using auxiliary datasets. FLAIN dynamically inverts the sign of weight updates for these neurons during aggregation to suppress backdoor activation. This work introduces the first neuron-activation-driven adaptive weight inversion mechanism, requiring no model architecture modification, no access to client data, and remaining effective under non-IID data distributions and high fractions of malicious clients. Extensive experiments demonstrate that FLAIN reduces success rates of diverse backdoor attacks to below 5%, while incurring less than 0.5% degradation in clean accuracy—outperforming state-of-the-art defenses.
📝 Abstract
Federated learning enables multiple clients to collaboratively train machine learning models under the overall planning of the server while adhering to privacy requirements. However, the server cannot directly oversee the local training process, creating an opportunity for malicious clients to introduce backdoors. Existing research shows that backdoor attacks activate specific neurons in the compromised model, which remain dormant when processing clean data. Leveraging this insight, we propose a method called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to defend against backdoor attacks in federated learning. Specifically, after completing global training, we employ an auxiliary dataset to identify low-activation input neurons and flip the associated weight updates. We incrementally raise the threshold for low-activation inputs and flip the weight updates iteratively, until the performance degradation on the auxiliary data becomes unacceptable. Extensive experiments validate that our method can effectively reduce the success rate of backdoor attacks to a low level in various attack scenarios including those with non-IID data distribution or high MCRs, causing only minimal performance degradation on clean data.