A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of backdoor defense in federated learning under non-IID data—particularly the reliance on homogeneous assumptions or clean server-side data—this paper proposes CLIP-Fed, the first framework to leverage the zero-shot capability of vision-language pre-trained models (CLIP) for federated backdoor defense. CLIP-Fed generates a server-side augmented dataset without client samples via frequency-domain analysis and introduces a pre- and post-aggregation defense strategy. This strategy jointly employs prototype contrastive loss and KL-divergence regularization to decouple trigger patterns from label associations, enabling unsupervised, client-data-free robust defense. Evaluated on CIFAR-10 and CIFAR-10-LT, CLIP-Fed reduces average attack success rate (ASR) by 2.03% and 1.35%, respectively, while improving model accuracy (MA) by 7.92% and 0.48%, outperforming state-of-the-art methods significantly.

Technology Category

Application Category

📝 Abstract
Existing backdoor defense methods in Federated Learning (FL) rely on the assumption of homogeneous client data distributions or the availability of a clean serve dataset, which limits the practicality and effectiveness. Defending against backdoor attacks under heterogeneous client data distributions while preserving model performance remains a significant challenge. In this paper, we propose a FL backdoor defense framework named CLIP-Fed, which leverages the zero-shot learning capabilities of vision-language pre-training models. By integrating both pre-aggregation and post-aggregation defense strategies, CLIP-Fed overcomes the limitations of Non-IID imposed on defense effectiveness. To address privacy concerns and enhance the coverage of the dataset against diverse triggers, we construct and augment the server dataset using the multimodal large language model and frequency analysis without any client samples. To address class prototype deviations caused by backdoor samples and eliminate the correlation between trigger patterns and target labels, CLIP-Fed aligns the knowledge of the global model and CLIP on the augmented dataset using prototype contrastive loss and Kullback-Leibler divergence. Extensive experiments on representative datasets validate the effectiveness of CLIP-Fed. Compared to state-of-the-art methods, CLIP-Fed achieves an average reduction in ASR, i.e., 2.03% on CIFAR-10 and 1.35% on CIFAR-10-LT, while improving average MA by 7.92% and 0.48%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Defending FL backdoor attacks under heterogeneous data distributions
Overcoming Non-IID limitations in backdoor defense effectiveness
Eliminating trigger-target label correlation without client samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision-language pre-training for defense
Integrates pre and post-aggregation strategies
Augments server dataset via multimodal analysis
🔎 Similar Papers
No similar papers found.
Keke Gai
Keke Gai
Beijing Institute of Technology
Cyber SecurityBlockchainAI SecurityPrivacy-preserving ComputationFinTech
D
Dongjue Wang
School of Cyberspace Science and Technology, Beijing Institute of Technology
Jing Yu
Jing Yu
Northwestern University
SustainabilityLife Cycle AnalysisTransportation ManagementOperations Research
L
Liehuang Zhu
School of Cyberspace Science and Technology, Beijing Institute of Technology
Q
Qi Wu
Australian Institute of Machine Learning, The University of Adelaide