ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the vulnerability of federated instruction tuning to backdoor attacks caused by poisoned data distributed across participants, a scenario where existing defenses fail. To mitigate this threat without requiring clients to share private instructions, the authors propose a novel federated framework that identifies poisoned samples through frequency-domain gradient signals and introduces a global two-stage clustering mechanism to enable cross-client collaborative detection and sanitization. This approach achieves the first effective identification and removal of distributed poisoned data in federated instruction tuning, attaining poisoning detection rates of 92.00%–100.00% across four federated datasets, reducing attack success rates to near zero, and preserving main-task performance—thereby significantly enhancing the security of federated instruction tuning.

Technology Category

Application Category

📝 Abstract
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these difficulties, we identify gradients in the frequency domain as a robust signal to distinguish poisoned data. We further propose a global secondary clustering mechanism that facilitates collaborative identification of poisoned samples across clients. In summary, this paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, removes, and even purifies interspersed poisoned data across clients during the training. Experimental results on four FL datasets show that ProtegoFed identifies $92.00\% \sim 100.00\%$ of poisoned samples, reduces the attack success rate to almost zero, and maintains utility on the main task. Code is available at https://github.com/dongdongzhaoUP/ProtegoFed.
Problem

Research questions and friction points this paper is trying to address.

Federated Instruction Tuning
Backdoor Attack
Poisoned Data
Cross-silo Federated Learning
Data Poisoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Instruction Tuning
Backdoor Defense
Poisoned Data Detection
Frequency-domain Gradient
Collaborative Clustering
🔎 Similar Papers
No similar papers found.