🤖 AI Summary
In federated learning (FL), stealthy backdoor attacks evade existing detection-based defenses by crafting malicious model updates statistically indistinguishable from benign ones, enabling cumulative backdoor activation during aggregation. This work proposes a proactive defense paradigm that operates solely at the server without inspecting client updates: it continuously injects out-of-distribution (OOD) samples to activate redundant neurons in the global model, thereby inducing conflicts between benign and malicious updates on critical neurons and suppressing backdoor knowledge injection at its source. We theoretically establish, for the first time, that backdoor success stems from the absence of such benign–malicious update conflicts on redundant neurons, and we design a provably sound dynamic mapping and conflict-resolution mechanism. Experiments across diverse FL settings demonstrate that our method significantly outperforms state-of-the-art defenses against various stealthy backdoor attacks, with zero additional client-side overhead and no degradation in primary task accuracy.
📝 Abstract
Federated learning (FL) systems allow decentralized data-owning clients to jointly train a global model through uploading their locally trained updates to a centralized server. The property of decentralization enables adversaries to craft carefully designed backdoor updates to make the global model misclassify only when encountering adversary-chosen triggers. Existing defense mechanisms mainly rely on post-training detection after receiving updates. These methods either fail to identify updates which are deliberately fabricated statistically close to benign ones, or show inconsistent performance in different FL training stages. The effect of unfiltered backdoor updates will accumulate in the global model, and eventually become functional. Given the difficulty of ruling out every backdoor update, we propose a backdoor defense paradigm, which focuses on proactive robustification on the global model against potential backdoor attacks. We first reveal that the successful launching of backdoor attacks in FL stems from the lack of conflict between malicious and benign updates on redundant neurons of ML models. We proceed to prove the feasibility of activating redundant neurons utilizing out-of-distribution (OOD) samples in centralized settings, and migrating to FL settings to propose a novel backdoor defense mechanism, TrojanDam. The proposed mechanism has the FL server continuously inject fresh OOD mappings into the global model to activate redundant neurons, canceling the effect of backdoor updates during aggregation. We conduct systematic and extensive experiments to illustrate the superior performance of TrojanDam, over several SOTA backdoor defense methods across a wide range of FL settings.