Decoding FL Defenses: Systemization, Pitfalls, and Remedies

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Federated learning (FL) poisoning defense research lacks standardized evaluation protocols, leading to methodological pitfalls—such as dataset bias and oversimplified attack assumptions—that produce spurious claims of robustness. Method: We propose the first three-dimensional defense classification framework and conduct a systematic bibliometric analysis, controlled-variable re-evaluation across 42 representative works, and case-driven critical assessment to identify and quantify six prevalent evaluation flaws. Results: We find that 30% of studies rely solely on MNIST, 40% employ weak attacks (e.g., label-flipping only), and three widely adopted defenses exhibit substantially degraded robustness under rigorous, corrected evaluation. Our work establishes reproducible, trustworthy FL defense evaluation guidelines; delivers actionable, implementation-ready recommendations; and advances the field from superficially effective defenses toward genuinely reliable ones.

Technology Category

Application Category

📝 Abstract

While the community has designed various defenses to counter the threat of poisoning attacks in Federated Learning (FL), there are no guidelines for evaluating these defenses. These defenses are prone to subtle pitfalls in their experimental setups that lead to a false sense of security, rendering them unsuitable for practical deployment. In this paper, we systematically understand, identify, and provide a better approach to address these challenges. First, we design a comprehensive systemization of FL defenses along three dimensions: i) how client updates are processed, ii) what the server knows, and iii) at what stage the defense is applied. Next, we thoroughly survey 50 top-tier defense papers and identify the commonly used components in their evaluation setups. Based on this survey, we uncover six distinct pitfalls and study their prevalence. For example, we discover that around 30% of these works solely use the intrinsically robust MNIST dataset, and 40% employ simplistic attacks, which may inadvertently portray their defense as robust. Using three representative defenses as case studies, we perform a critical reevaluation to study the impact of the identified pitfalls and show how they lead to incorrect conclusions about robustness. We provide actionable recommendations to help researchers overcome each pitfall.

Problem

Research questions and friction points this paper is trying to address.

Evaluating FL defense robustness

Identifying experimental setup pitfalls

Providing actionable defense recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systemizes FL defenses

Identifies evaluation pitfalls

Provides actionable recommendations

🔎 Similar Papers

Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks