ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the vulnerability of federated instruction tuning to backdoor attacks caused by poisoned data distributed across participants, a scenario where existing defenses fail. To mitigate this threat without requiring clients to share private instructions, the authors propose a novel federated framework that identifies poisoned samples through frequency-domain gradient signals and introduces a global two-stage clustering mechanism to enable cross-client collaborative detection and sanitization. This approach achieves the first effective identification and removal of distributed poisoned data in federated instruction tuning, attaining poisoning detection rates of 92.00%–100.00% across four federated datasets, reducing attack success rates to near zero, and preserving main-task performance—thereby significantly enhancing the security of federated instruction tuning.

Technology Category

Application Category

📝 Abstract

Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these difficulties, we identify gradients in the frequency domain as a robust signal to distinguish poisoned data. We further propose a global secondary clustering mechanism that facilitates collaborative identification of poisoned samples across clients. In summary, this paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, removes, and even purifies interspersed poisoned data across clients during the training. Experimental results on four FL datasets show that ProtegoFed identifies $92.00\% \sim 100.00\%$ of poisoned samples, reduces the attack success rate to almost zero, and maintains utility on the main task. Code is available at https://github.com/dongdongzhaoUP/ProtegoFed.

Problem

Research questions and friction points this paper is trying to address.

Federated Instruction Tuning

Backdoor Attack

Poisoned Data

Cross-silo Federated Learning

Data Poisoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Instruction Tuning

Backdoor Defense

Poisoned Data Detection