๐ค AI Summary
Federated learning (FL) for NLP faces severe threats from stealthy and persistent backdoor attacks; existing methods fail on large language models (e.g., GPT-2) and lack robustness across training rounds and against defenses. This paper proposes SDBA: a Stealthy and Durable Backdoor Attack framework that identifies attack-susceptible layers in LSTM/GPT-2 via layer-wise sensitivity analysis, then applies intra-layer gradient masking and top-k% sparse gradient clippingโensuring high stealth during both training and inference. Client-side local poisoning ensures strong attack persistence. SDBA is the first backdoor attack for FL-NLP that simultaneously achieves *stealth* (evading detection) and *durability* (withstanding cross-round dynamics and defenses including Krum, RFA, and norm clipping). It maintains >92% attack success rate against FedAvg aggregation in next-token prediction and sentiment analysis tasks, and remains effective for over 50 rounds on GPT-2.
๐ Abstract
Federated Learning is a promising approach for training machine learning models while preserving data privacy, but its distributed nature makes it vulnerable to backdoor attacks, particularly in NLP tasks while related research remains limited. This paper introduces SDBA, a novel backdoor attack mechanism designed for NLP tasks in FL environments. Our systematic analysis across LSTM and GPT-2 models identifies the most vulnerable layers for backdoor injection and achieves both stealth and long-lasting durability through layer-wise gradient masking and top-k% gradient masking within these layers. Experiments on next token prediction and sentiment analysis tasks show that SDBA outperforms existing backdoors in durability and effectively bypasses representative defense mechanisms, with notable performance in LLM such as GPT-2. These results underscore the need for robust defense strategies in NLP-based FL systems.