🤖 AI Summary
This work identifies a novel backdoor threat in prompt-based federated learning (FL) for multimodal models, where malicious clients jointly optimize local backdoor triggers and prompt embeddings—without modifying model parameters—to inject harmful prompts into the global aggregation process.
Method: We propose the first prompt-level backdoor attack framework tailored to multimodal contrastive learning (e.g., CLIP), leveraging in-context learning to achieve cross-modal, highly stealthy, and strongly generalizable malicious activation. Our approach integrates prompt learning, FL aggregation mechanisms, and adversarial trigger optimization.
Contribution/Results: Evaluated across multiple datasets and aggregation protocols, the attack achieves >90% success rate with only a few malicious clients, while preserving full accuracy on benign tasks. This work establishes a critical analytical paradigm and empirical benchmark for security assessment of multimodal FL systems.
📝 Abstract
Prompt-based tuning has emerged as a lightweight alternative to full fine-tuning in large vision-language models, enabling efficient adaptation via learned contextual prompts. This paradigm has recently been extended to federated learning settings (e.g., PromptFL), where clients collaboratively train prompts under data privacy constraints. However, the security implications of prompt-based aggregation in federated multimodal learning remain largely unexplored, leaving a critical attack surface unaddressed. In this paper, we introduce extbf{BadPromptFL}, the first backdoor attack targeting prompt-based federated learning in multimodal contrastive models. In BadPromptFL, compromised clients jointly optimize local backdoor triggers and prompt embeddings, injecting poisoned prompts into the global aggregation process. These prompts are then propagated to benign clients, enabling universal backdoor activation at inference without modifying model parameters. Leveraging the contextual learning behavior of CLIP-style architectures, BadPromptFL achieves high attack success rates (e.g., (>90%)) with minimal visibility and limited client participation. Extensive experiments across multiple datasets and aggregation protocols validate the effectiveness, stealth, and generalizability of our attack, raising critical concerns about the robustness of prompt-based federated learning in real-world deployments.